#  @dustinvannoy Dustin Vannoy Dustin Vannoy posts on YouTube about azure, databricks, engineering, apache the most. They currently have [-----] followers and [--] posts still getting attention that total [---] engagements in the last [--] hours. ### Engagements: [---] [#](/creator/youtube::UCYdC0t9EFtyVAs0-cwqVCTw/interactions)  - [--] Week [---] -1.50% - [--] Month [-----] -36% - [--] Months [------] -34% - [--] Year [------] +75% ### Mentions: [--] [#](/creator/youtube::UCYdC0t9EFtyVAs0-cwqVCTw/posts_active)  - [--] Months [--] -17% - [--] Year [--] +150% ### Followers: [-----] [#](/creator/youtube::UCYdC0t9EFtyVAs0-cwqVCTw/followers)  - [--] Week [-----] +0.68% - [--] Month [-----] +5.20% - [--] Months [-----] +26% - [--] Year [-----] +84% ### CreatorRank: undefined [#](/creator/youtube::UCYdC0t9EFtyVAs0-cwqVCTw/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [finance](/list/finance) [social networks](/list/social-networks) [stocks](/list/stocks) [events](/list/events) **Social topic influence** [azure](/topic/azure), [databricks](/topic/databricks) #53, [engineering](/topic/engineering), [apache](/topic/apache), [how to](/topic/how-to), [in the](/topic/in-the), [ai](/topic/ai), [this is](/topic/this-is), [tutorial](/topic/tutorial), [san diego](/topic/san-diego) **Top accounts mentioned or mentioned by** [@24chynowethdatabrickssystemtablesanintroductione11a06872405](/creator/undefined) [@abrahampabbathiintegratingmicrosoftfabricwithdatabricksf2203f65b224](/creator/undefined) **Top assets mentioned** [Microsoft Corp. (MSFT)](/topic/microsoft) [Alphabet Inc Class A (GOOGL)](/topic/$googl) ### Top Social Posts Top posts by engagements in the last [--] hours "dbt + Databricks Overview: SQL-based ETL In this video I introduce dbt and how it integrates with Databricks to support SQL based ETL. This video is to teach you: [--]. Why dbt is helpful for building SQL based data pipelines on any Data Warehouse platform. [--]. The basics of how you use dbt with Databricks. [--]. A few key benefits of using Databricks. **All thoughts and opinions are my own** References: dbt Cloud vs dbt Core: https://www.getdbt.com/product/dbt-core-vs-dbt-cloud dbt on Lakehouse Design Patterns:" [YouTube Link](https://youtube.com/watch?v=lyb9qKTMasI) 2025-05-21T12:01Z [----] followers, 12.4K engagements "Azure Stream Analytics with Event Hubs In this video I introduce Azure Stream Analytics by walking through building a Stream Analytics Job from scratch using Azure Event Hubs as the input and output. This video includes all the setup steps and should have enough detail for you to get your own job setup for stream processing with Event Hubs and Stream Analytics. More from Dustin: Website: https://dustinvannoy.com Twitter: https://twitter.com/dustinvannoy LinkedIn: https://www.linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 1:03 Create Job 2:34 Create" [YouTube Link](https://youtube.com/watch?v=83e0HCmLFfY) 2021-11-05T13:00Z [----] followers, [----] engagements "7 Best Practices for Development and CICD on Databricks In this video I share why developer experience and best practices are important and why I think Databricks offers the best developer experience for a data platform. I'll cover high level developer lifecycle and [--] ways to improve your team's development process with a goal of better quality and reliability. Stay tuned for follow up videos that cover some of the key topics discussed here. Blog post with more details and deep dives: Coming soon * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com" [YouTube Link](https://youtube.com/watch?v=IWS2AzkTKl0) 2024-12-20T13:38Z [----] followers, [----] engagements "Monitoring Databricks with System Tables In this video I focus on a different side of monitoring: What do the Databricks system tables offer me for monitoring How much does this overlap with the application logs and Spark metrics Databricks System Tables are a public preview feature that can be enabled if you have Unity Catalog on your workspace. I introduce the concept in the first [--] minutes then summarize where this is most helpful in the last [--] minutes. In between are some example queries and table explanations. * All thoughts and opinions are my own * Blog post with more detail:" [YouTube Link](https://youtube.com/watch?v=VfqLBJvqomM) 2024-02-22T15:20Z [----] followers, [----] engagements "Databricks Asset Bundles: Advanced Examples Databricks Asset Bundles is now GA (Generally Available). As more Databricks users start to rely on Databricks Asset Bundles (DABs) for their development and deployment workflows let's look at some advanced patterns people have been asking for examples to help them get started. Blog post with these examples: https://dustinvannoy.com/2024/06/25/databricks-asset-bundles-advanced Intro post: https://dustinvannoy.com/2023/10/03/databricks-ci-cd-intro-to-asset-bundles-dabs * All thoughts and opinions are my own * References: Datakickstart DABs repo:" [YouTube Link](https://youtube.com/watch?v=ZuQzIbRoFC4) 2024-06-25T12:00Z [----] followers, 17.3K engagements "Azure Synapse Spark Monitoring with Log Analytics Log Analytics provides a way to easily query logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through setting up Azure Synapse Apache Spark to connect to Log Analytics adding custom log messages from PySpark and how to query logs from Spark log4j using Azure Log Analytics. Written tutorial and troubleshooting steps: https://dustinvannoy.com/2022/05/12/monitor-synapse-spark-with-log-analytics/ More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github:" [YouTube Link](https://youtube.com/watch?v=j1W5lJuohq8) 2022-05-13T13:30Z [----] followers, [----] engagements "Databricks VS Code Extension: Serverless Compute The Databricks Extension for Visual Studio Code now supports connecting to Serverless compute when running with Databricks Connect or running as a workflow. This video provides a quick explanation of when this is a good option and a short demo of it in action. **All thoughts and opinions are my own** More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart Databricks Azure Databricks VS Code Visual Studio IDE Developer Experience Databricks Connect Serverless" [YouTube Link](https://youtube.com/watch?v=20D9S_r9ZTM) 2025-02-28T13:00Z [----] followers, [----] engagements "Azure Synapse Spark Scala In this video I share with you about Apache Spark using Scala. We'll walk through a quick demo on Azure Synapse Analytics an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at Spark in Azure Synapse. If you are new to Apache Spark just know that it is a popular framework for data engineers that can be run in a variety of environments. It is popular because it enables distributed data processing with a relatively simple API. If you want to see examples" [YouTube Link](https://youtube.com/watch?v=8hG7CSmbjMk) 2021-02-04T13:00Z [----] followers, [----] engagements "Introducing DBRX Open LLM - Data Engineering San Diego (May 2024) A special event presented by Data Engineering San Diego Databricks User Group and San Diego Software Engineers. Presentation: Introducing DBRX - Open LLM by Databricks By: Vitaliy Chiley Head of LLM Pretraining for Mosaic at Databricks DBRX is an open-source LLM by Databricks which when recently released outperformed established open-source models on a set of standard benchmarks. Join us to learn firsthand about how the Mosaic Research team built DBRX and why it matters. This talk will cover the architecture model evaluation" [YouTube Link](https://youtube.com/watch?v=NGW5y5A1JXA) 2024-06-04T23:03Z [----] followers, [---] engagements "Databricks CI/CD: Azure DevOps Pipeline + DABs Many organizations choose Azure DevOps for automated deployments on Azure. When deploying to Databricks you can take similar deploy pipeline code that you use for other projects but use it with Databricks Asset Bundles. This video shows most of the steps involved in setting this up by following along with a blog post that shares example code and steps. * All thoughts and opinions are my own * Blog post on DABs with Azure DevOps:" [YouTube Link](https://youtube.com/watch?v=SZM49lGovTg) 2024-08-19T14:30Z [----] followers, 28.4K engagements "Spark Monitoring: Basics In this video I cover how to use the default UI for monitoring Apache Spark. I use Azure Databricks to demonstrate but most of the methods are the same in any Spark environment. I focus most on how I use it in my day to day work as a Data Engineer. Topics included: Spark UI query plan basic Ganglia Metrics and driver logs. Monitoring Streams with Custom Streaming Query Listener: https://youtu.be/iqIdmCvSwwUt=268 More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github: https://github.com/datakickstart" [YouTube Link](https://youtube.com/watch?v=Sm98zGfAMvA) 2021-07-26T15:16Z [----] followers, 19.7K engagements "Claude Code: [--] Essentials for Data Engineering This video is a guide for data professionals (Data Engineers Data Scientists and Analytics Engineers) on adopting AI for development using Claude Code. The main idea: success with Claude Code comes down to managing context and memory well. Claude Code works as an agentit handles multi-file edits runs tests and fixes errors pretty autonomously. The [--] essentials you should understand about Claude Code: [--]. claude.md - Your main instruction file. It tells Claude your project's rules frameworks and design principles from the start. [--]. Skills -" [YouTube Link](https://youtube.com/watch?v=YnIWW88l0mc) 2026-01-08T14:01Z [----] followers, [----] engagements "Databricks VS Code Extension v2: Setup and Feature Demo Databricks Visual Studio Code Extension v2 the next major release is now generally available. In this video I walk through the initial setup and the main ways you will run code and deploy resources using this extension. I also provide some key tips to make sure you don't get stuck along the way. * All thoughts and opinions are my own * References: Databricks blog: https://www.databricks.com/blog/simplified-faster-development-new-capabilities-databricks-vs-code-extension Databricks Asset Bundle training:" [YouTube Link](https://youtube.com/watch?v=o4qMWHgT1zM) 2024-09-26T12:00Z [----] followers, 19.2K engagements "Data + AI Summit 2023: Key Takeaways Data + AI Summit key takeaways from a Data Engineers perspective. Which features coming to Apache Spark and to Databricks are most exciting for data engineering I cover that plus a decent amount of AI and LLM talk in this informal video. See the blog post for a bit more thought out summaries and links to many of the keynote demos related to the features I am excited about. Blog post: https://dustinvannoy.com/2023/06/30/dais-2023-data-engineer-takeaways/ * All thoughts and opinions are my own * Databricks Apache Spark Azure Databricks DataAISummit Data AI" [YouTube Link](https://youtube.com/watch?v=32qa64reZ0I) 2023-07-01T13:35Z [----] followers, [---] engagements "Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial The tech industry has evolved rapidly and AI coding tools are changing how we develop. These tools have amazing potential to speed up the development process for data professionals building Databricks focused projects. While we see advanced AI features in the online Databricks workspace there are also amazing capabilities in tools you install on your development machine such as Cursor and Claude Code. In this video I explain recommendations use Cursor with Databricks. This includes using some development tools that I have shared about" [YouTube Link](https://youtube.com/watch?v=Ii2LuEJ0gpc) 2025-09-30T12:00Z [----] followers, [----] engagements "Parallel table ingestion with a Spark Notebook (PySpark + Threading) If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big it is quicker to have Spark load tables concurrently (in parallel) using multithreading. There are some different options of how to do this but I am sharing the easiest way I have found when working with a PySpark notebook in Databricks Azure Synapse Spark Jupyter or" [YouTube Link](https://youtube.com/watch?v=hFGu2enSjTY) 2022-05-06T14:20Z [----] followers, 17.3K engagements "Databricks CI/CD: Intro to Databricks Asset Bundles (DABs) Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks Python code Delta Live Tables pipelines and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs. Blog post with extra examples:" [YouTube Link](https://youtube.com/watch?v=uG0dTF5mmvc) 2023-10-04T12:00Z [----] followers, 40.2K engagements "Claude Code: [--] Essentials for Data Engineering This video is a guide for data professionals (Data Engineers Data Scientists and Analytics Engineers) on adopting AI for development using Claude Code. The main idea: success with Claude Code comes down to managing context and memory well. Claude Code works as an agentit handles multi-file edits runs tests and fixes errors pretty autonomously. The [--] essentials you should understand about Claude Code: [--]. claude.md - Your main instruction file. It tells Claude your project's rules frameworks and design principles from the start. [--]. Skills -" [YouTube Link](https://youtube.com/watch?v=YnIWW88l0mc) 2026-01-08T14:01Z [----] followers, [----] engagements "Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial The tech industry has evolved rapidly and AI coding tools are changing how we develop. These tools have amazing potential to speed up the development process for data professionals building Databricks focused projects. While we see advanced AI features in the online Databricks workspace there are also amazing capabilities in tools you install on your development machine such as Cursor and Claude Code. In this video I explain recommendations use Cursor with Databricks. This includes using some development tools that I have shared about" [YouTube Link](https://youtube.com/watch?v=Ii2LuEJ0gpc) 2025-09-30T12:00Z [----] followers, [----] engagements "dbt + Databricks Overview: SQL-based ETL In this video I introduce dbt and how it integrates with Databricks to support SQL based ETL. This video is to teach you: [--]. Why dbt is helpful for building SQL based data pipelines on any Data Warehouse platform. [--]. The basics of how you use dbt with Databricks. [--]. A few key benefits of using Databricks. **All thoughts and opinions are my own** References: dbt Cloud vs dbt Core: https://www.getdbt.com/product/dbt-core-vs-dbt-cloud dbt on Lakehouse Design Patterns:" [YouTube Link](https://youtube.com/watch?v=lyb9qKTMasI) 2025-05-21T12:01Z [----] followers, 12.4K engagements "Databricks VS Code Extension: Serverless Compute The Databricks Extension for Visual Studio Code now supports connecting to Serverless compute when running with Databricks Connect or running as a workflow. This video provides a quick explanation of when this is a good option and a short demo of it in action. **All thoughts and opinions are my own** More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart Databricks Azure Databricks VS Code Visual Studio IDE Developer Experience Databricks Connect Serverless" [YouTube Link](https://youtube.com/watch?v=20D9S_r9ZTM) 2025-02-28T13:00Z [----] followers, [----] engagements "Unity Catalog OSS Spotlight Unity Catalog Open Source Software (OSS) is a compelling project and there are some key benefits to working with it locally. In this video I share reason for using the open source project Unity Catalog (UC) and walk through some of the setup and testing I did to create and write to tables from Apache Spark. *All thoughts and opinions are my own* Links: Blog post - https://dustinvannoy.com/2025/01/30/oss-spotlight-unity-catalog Roadmap 2024Q4 - https://github.com/unitycatalog/unitycatalog/discussions/411 Events - https://lu.ma/unity-catalog More from Dustin:" [YouTube Link](https://youtube.com/watch?v=W3TsKgYJkz4) 2025-01-30T13:00Z [----] followers, [----] engagements "Developer Best Practices on Databricks: Git Tests and Automated Deployment Data engineers and data scientists benefit from using best practices learned from years of software development. This video walks through [--] of the most important practices to build quality analytics solutions. It is meant to be an overview of what following these practices looks like for a Databricks developer. This video covers: - Version control basics and demo of Git integration with Databricks workspace - Automated tests with pytest for unit testing and Databricks Workflows for integration testing - CI/CD including" [YouTube Link](https://youtube.com/watch?v=MolLJRD8kgM) 2025-01-06T13:00Z [----] followers, 14.9K engagements "7 Best Practices for Development and CICD on Databricks In this video I share why developer experience and best practices are important and why I think Databricks offers the best developer experience for a data platform. I'll cover high level developer lifecycle and [--] ways to improve your team's development process with a goal of better quality and reliability. Stay tuned for follow up videos that cover some of the key topics discussed here. Blog post with more details and deep dives: Coming soon * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com" [YouTube Link](https://youtube.com/watch?v=IWS2AzkTKl0) 2024-12-20T13:38Z [----] followers, [----] engagements "Databricks VS Code: Multiple Projects In VS Code Workspace In this video I cover a specific option for work with Databricks Visual Studio Code Extensionwhat it I have many project folders each as their own bundle but I want to work in the same VS Code workspace I talk through a couple ways to work with this and show how to switch the active project folder in order to run files from different bundles. You may need this if: - VS Code is only opening one Databricks project but you want multiple open in the same session. - You are getting error on Python script like "Error: init: listener: timed" [YouTube Link](https://youtube.com/watch?v=p2teBtk6Duk) 2024-10-15T14:36Z [----] followers, [----] engagements "Databricks VS Code Extension v2: Upgrade steps In this short video I show you how to upgrade a project from using Databricks Visual Studio Code version [--] to using the new version. There are a few key setup steps included and a quick glimpse at the new Databricks run button. For a more complete view of using the Databricks Visual Studio Code extension see this video: https://www.youtube.com/watchv=o4qMWHgT1zM * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart" [YouTube Link](https://youtube.com/watch?v=XMYdwr4lc6w) 2024-10-02T14:35Z [----] followers, [---] engagements "Databricks VS Code Extension v2: Setup and Feature Demo Databricks Visual Studio Code Extension v2 the next major release is now generally available. In this video I walk through the initial setup and the main ways you will run code and deploy resources using this extension. I also provide some key tips to make sure you don't get stuck along the way. * All thoughts and opinions are my own * References: Databricks blog: https://www.databricks.com/blog/simplified-faster-development-new-capabilities-databricks-vs-code-extension Databricks Asset Bundle training:" [YouTube Link](https://youtube.com/watch?v=o4qMWHgT1zM) 2024-09-26T12:00Z [----] followers, 19.2K engagements "Databricks CI/CD: Azure DevOps Pipeline + DABs Many organizations choose Azure DevOps for automated deployments on Azure. When deploying to Databricks you can take similar deploy pipeline code that you use for other projects but use it with Databricks Asset Bundles. This video shows most of the steps involved in setting this up by following along with a blog post that shares example code and steps. * All thoughts and opinions are my own * Blog post on DABs with Azure DevOps:" [YouTube Link](https://youtube.com/watch?v=SZM49lGovTg) 2024-08-19T14:30Z [----] followers, 28.4K engagements "Databricks Asset Bundles: Advanced Examples Databricks Asset Bundles is now GA (Generally Available). As more Databricks users start to rely on Databricks Asset Bundles (DABs) for their development and deployment workflows let's look at some advanced patterns people have been asking for examples to help them get started. Blog post with these examples: https://dustinvannoy.com/2024/06/25/databricks-asset-bundles-advanced Intro post: https://dustinvannoy.com/2023/10/03/databricks-ci-cd-intro-to-asset-bundles-dabs * All thoughts and opinions are my own * References: Datakickstart DABs repo:" [YouTube Link](https://youtube.com/watch?v=ZuQzIbRoFC4) 2024-06-25T12:00Z [----] followers, 17.3K engagements "Introducing DBRX Open LLM - Data Engineering San Diego (May 2024) A special event presented by Data Engineering San Diego Databricks User Group and San Diego Software Engineers. Presentation: Introducing DBRX - Open LLM by Databricks By: Vitaliy Chiley Head of LLM Pretraining for Mosaic at Databricks DBRX is an open-source LLM by Databricks which when recently released outperformed established open-source models on a set of standard benchmarks. Join us to learn firsthand about how the Mosaic Research team built DBRX and why it matters. This talk will cover the architecture model evaluation" [YouTube Link](https://youtube.com/watch?v=NGW5y5A1JXA) 2024-06-04T23:03Z [----] followers, [---] engagements "Monitoring Databricks with System Tables In this video I focus on a different side of monitoring: What do the Databricks system tables offer me for monitoring How much does this overlap with the application logs and Spark metrics Databricks System Tables are a public preview feature that can be enabled if you have Unity Catalog on your workspace. I introduce the concept in the first [--] minutes then summarize where this is most helpful in the last [--] minutes. In between are some example queries and table explanations. * All thoughts and opinions are my own * Blog post with more detail:" [YouTube Link](https://youtube.com/watch?v=VfqLBJvqomM) 2024-02-22T15:20Z [----] followers, [----] engagements "Databricks Monitoring with Log Analytics - Updated for DBR 11.3+ In this video I show the latest way to setup and use Log Analytics for storing and querying you Databricks logs. My prior video covered the steps for earlier Databricks Runtime Versions (prior to 11.0). This video covers using the updated code for Databricks Runtime [----] [----] or [----]. There are various options for monitoring Databricks but since Log Analytics provides a way to easily query logs and setup alerts in Azure you may choose to send your Databricks logs there as well. * All thoughts and opinions are my own * Blog post" [YouTube Link](https://youtube.com/watch?v=CVzGWWSGWGg) 2024-01-08T13:00Z [----] followers, [----] engagements "Databricks CI/CD: Intro to Databricks Asset Bundles (DABs) Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks Python code Delta Live Tables pipelines and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs. Blog post with extra examples:" [YouTube Link](https://youtube.com/watch?v=uG0dTF5mmvc) 2023-10-04T12:00Z [----] followers, 40.2K engagements "Data + AI Summit 2023: Key Takeaways Data + AI Summit key takeaways from a Data Engineers perspective. Which features coming to Apache Spark and to Databricks are most exciting for data engineering I cover that plus a decent amount of AI and LLM talk in this informal video. See the blog post for a bit more thought out summaries and links to many of the keynote demos related to the features I am excited about. Blog post: https://dustinvannoy.com/2023/06/30/dais-2023-data-engineer-takeaways/ * All thoughts and opinions are my own * Databricks Apache Spark Azure Databricks DataAISummit Data AI" [YouTube Link](https://youtube.com/watch?v=32qa64reZ0I) 2023-07-01T13:35Z [----] followers, [---] engagements "PySpark Kickstart - Read and Write Data with Apache Spark Every Spark pipeline involves reading data from a data source or table and often ends with writing data. In this video we walk through some of the most common formats and cloud storage used for reading and writing with Spark. Includes some guidance on authenticating to ADLS OneLake S3 Google Cloud Storage Azure SQL Database and Snowflake. Once you have watched this tutorial go find a free dataset and try to read and write within your environment. * All thoughts and opinions are my own * For links to the code and more information on" [YouTube Link](https://youtube.com/watch?v=EN1TaVEkqXg) 2023-06-22T12:00Z [----] followers, [----] engagements "Data Engineering San Diego - Intro to dbt Data Engineering San Diego group monthly meeting for a presentation followed by group questions and responses. We will start the stream as close to 5:30 as we can. dbt - Introduction and demo As a data practitioner have you ever lost trust from a stakeholder because of inaccurate data on a dashboard Could that have been fixed by testing that the primary keys were actually unique Are you a sql analyst that is intimately familiar with the business logic needed for a use case but are dependent on someone else to build the pipeline for you In this session" [YouTube Link](https://youtube.com/watch?v=y-keGmph-nc) 2023-05-19T13:58Z [----] followers, [---] engagements "Spark SQL Kickstart: Your first Spark SQL application Get hands on with Spark SQL to build your first data pipeline. In this video I walk you through how to read transform and write the NYC Taxi dataset which can be found on Databricks Azure Synapse or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial go find a free dataset and try to write your own Spark application. * All thoughts and opinions are my own * For links to the code and more information on this course you can visit my website:" [YouTube Link](https://youtube.com/watch?v=RuGm2SmxCWk) 2023-05-18T12:00Z [----] followers, [----] engagements "PySpark Kickstart - Your first Apache Spark data pipeline Get hands on with Python and PySpark to build your first data pipeline. In this video I walk you through how to read transform and write the NYC Taxi dataset which can be found on Databricks Azure Synapse or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial go find a free dataset and try to write your own PySpark application. Pro tip: Search for the Spark equivalent of functions you use in other programming languages (including SQL). Many will exist in the" [YouTube Link](https://youtube.com/watch?v=9QvMSiPtlnU) 2023-05-02T11:00Z [----] followers, [----] engagements "Data Engineering San Diego - Intro to Large Language Models Special edition of the Data Engineering group where Data Engineering San Diego partnered with San Diego Software Engineers for an event. * We had a lot of in-person attendees and a lot of conversation so the mic could not be situated to catch all of the audio well. We try to make the livestream helpful for those that cannot attend in person but the room mics do not pick up enough conversation to provide as good of a virtual experience as we would like. * Introduction to Large Language Models for Software/Data Engineers Large Language" [YouTube Link](https://youtube.com/watch?v=4vwmRfQZD78) 2023-04-26T14:47Z [----] followers, [---] engagements "Spark Environment - Azure Databricks Trial In this video I cover how to setup a free Azure Trial and spin up a free Azure Databricks Trial. This is a great way to have an option for testing out Databricks and learning Apache Spark on Azure. Once setup you will see how to run a very simple test notebook. * All thoughts and opinions are my own * Additional links: Setup Databricks on AWS - https://youtu.be/gEDS5DOUgY8 Setup Databricks on Google Cloud - https://youtu.be/hquLYNN8nz8 Azure Databricks setup documentation: https://learn.microsoft.com/en-us/azure/databricks/getting-started/ More from" [YouTube Link](https://youtube.com/watch?v=VE8Z1E92eXA) 2023-04-19T11:00Z [----] followers, [---] engagements "Spark Environment - Databricks Community Edition In this video I cover how to setup a free Databricks community edition environment. This is a great way to have an option for testing out Databricks and learning Apache Spark and it doesnt expire after [--] days. It is limited functionality and scalability though so you wont be able to run a realistic proof of concept on this environment. Once setup you will see how to run a very simple test notebook. If you want to see how to setup a more a trial or more permanent Databricks environment in Azure see my other video - Coming soon * All thoughts" [YouTube Link](https://youtube.com/watch?v=Onwt8Twq3fs) 2023-04-11T13:17Z [----] followers, [----] engagements "Apache Spark DataKickstart - Introduction to Spark In this video I provide introduction to Apache Spark as part of my YouTube course Apache Spark DataKickstart. This video covers why Spark is popular what it really is and a bit about ways to run Apache Spark. Please check out other videos in this series by selecting the relevant playlist or subscribe and turn on notifications for new videos (coming soon). * All thoughts and opinions are my own * Learn more about Apache Spark with one of these options (including a few paid options): Databricks Training -" [YouTube Link](https://youtube.com/watch?v=0kQ7Iq_lG-k) 2023-03-23T12:59Z [----] followers, [----] engagements "Unity Catalog setup for Azure Databricks In this video I walk through setting up Unity Catalog on Azure and quickly exploring the cataloging features for a couple tables with a workflow. This includes setting up storage and access connector then a quick walk through of lineage and other metadata tracked at the table level. * All thoughts and opinions are my own * Learn more about Unity Catalog with one of the below videos: Short intro from Databricks - https://www.youtube.com/watchv=U1Ez5LNzl48 Data Lineage - https://www.youtube.com/watchv=8wGUnXhISz0 Deeper dive introduction -" [YouTube Link](https://youtube.com/watch?v=-RwzDRVgjLc) 2023-03-01T12:00Z [----] followers, 17.2K engagements "Visual Studio Code Extension for Databricks In this video I show how I installed configured and tested out the VS Code extension for Azure Databricks. This provides a way to develop PySpark code in your Visual Studio Code IDE and run the code on a Databricks cluster. It works well with Databricks Git Repos so you can keep your team in sync whether they work in VS Code or in Notebooks on the Databricks workspace. IMPORTANT UPDATE to how I explained this in the video: The repo used for syncing from local will not be an existing Databricks repo if using the update version (0.3.0+). This is to" [YouTube Link](https://youtube.com/watch?v=Quh1TuJQurA) 2023-02-21T12:00Z [----] followers, 20.4K engagements "Parallel Load in Spark Notebook - Questions Answered In this video I address questions from the tutorial I did on Parallel Table Ingestion with Spark Notebooks. See the chapters outline below for which questions are addressed. Original video that questions came from: https://dustinvannoy.com/2022/05/06/parallel-ingest-spark-notebook/ Q&A Writeup: https://dustinvannoy.com/2023/01/09/questions-answered-parallel-spark-notebook Running in Scala: https://docs.databricks.com/notebooks/noteblook-workflows.html#run-multiple-notebooks-concurrently More from Dustin: Website: https://dustinvannoy.com" [YouTube Link](https://youtube.com/watch?v=oOFrUm6JC-0) 2023-01-09T14:32Z [----] followers, [----] engagements "Delta Change Feed and Delta Merge pipeline (extended demo) This video shows an extended demo a pipeline that loads refined (silver) and curated (gold) tables. It complements the demo from the session "Data Ingestion - Practical Data Loading with Azure" that was part of PASS Data Community Summity [----]. It shows use of the Delta Lake Change Data Feed and the Merge command to track and process only inserts updates and deletes. Related content: Concurrent data ingestion in Spark notebooks - https://youtu.be/hFGu2enSjTY Databricks blog on change data feed -" [YouTube Link](https://youtube.com/watch?v=6Qf2C0i9oFU) 2022-11-16T17:04Z [----] followers, [----] engagements "Data Engineering SD: Rise of Immediate Intelligence - Apache Druid Presenter: Sergio Ferragut - Imply Decision making is changing: Apache Druid is a new type of database for creating the next generation of analytics applications that maximize flexible exploration over fresh fast-arriving data. In this talk Sergio Ferragut (Developer Advocate in Imply's Community Team) introduces these new "immediate intelligence" applications tells the story of Druid's emergence and describes how data pipelines built with Druid differ from those you may already be familiar with. Apache Druid Imply Data" [YouTube Link](https://youtube.com/watch?v=h5_Ec5smlho) 2022-07-22T13:55Z [----] followers, [---] engagements "Azure Synapse integration with Microsoft Purview data catalog In this video I show how to populate the Microsoft Purview data map and data catalog with metadata about Azure Synapse Analytics datasets. I cover brief explanation of these two important parts of Microsoft Purview then show how to setup permissions and ingest metadata about tables in Synapse Serverless SQL and Synapse Dedicated SQL pools. NOTE: This will cost money to have the Purview resource and to run scans. You are responsible to look at possible costs before running in your environment. I spent at least $30 on Purview related" [YouTube Link](https://youtube.com/watch?v=XrJwVD8QA4g) 2022-07-19T12:00Z [----] followers, [----] engagements "Adi Polak - Chaos Engineering - Managing Stages in a Complex Data Flow - Data Engineering SD Chaos Engineering and how to manage data stages in Large-Scale Complex Data Flow Presenter: Adi Polak A complex data flow is a set of operations to extract information from multiple sources copy them into multiple data targets while using extract transformations joins filters and sorts to refine the results. These are precisely the capabilities that the new open modern data stack provides us. Spark and other tools allow us to develop complex data flow on large-scale data. Chaos Engineering concepts" [YouTube Link](https://youtube.com/watch?v=vMWwYWpqQZs) 2022-06-21T12:30Z [----] followers, [---] engagements "Azure Synapse Spark Monitoring with Log Analytics Log Analytics provides a way to easily query logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through setting up Azure Synapse Apache Spark to connect to Log Analytics adding custom log messages from PySpark and how to query logs from Spark log4j using Azure Log Analytics. Written tutorial and troubleshooting steps: https://dustinvannoy.com/2022/05/12/monitor-synapse-spark-with-log-analytics/ More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github:" [YouTube Link](https://youtube.com/watch?v=j1W5lJuohq8) 2022-05-13T13:30Z [----] followers, [----] engagements "Parallel table ingestion with a Spark Notebook (PySpark + Threading) If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big it is quicker to have Spark load tables concurrently (in parallel) using multithreading. There are some different options of how to do this but I am sharing the easiest way I have found when working with a PySpark notebook in Databricks Azure Synapse Spark Jupyter or" [YouTube Link](https://youtube.com/watch?v=hFGu2enSjTY) 2022-05-06T14:20Z [----] followers, 17.3K engagements "SQL Server On Docker + deploy DB to Azure In this video I show how to setup SQL Server on Docker attached a database from local mdf file and then deploy to an Azure SQL database. What I like about using docker for this work is that I can script out the setup and only keep the container and image around when I need it. In the video I set it up using a Windows [--] laptop but the commands shown can work on Mac or Linux. Written tutorial and links to referenced sites: https://dustinvannoy.com/2022/04/26/sql-server-on-docker/ More from Dustin: Website: https://dustinvannoy.com LinkedIn:" [YouTube Link](https://youtube.com/watch?v=TPbCKUwJ_hE) 2022-04-27T03:56Z [----] followers, [----] engagements "Michael Kennedy - [--] tips for developers and data scientists - Data Engineering SD Michael Kennedy presenting at Data Engineering San Diego meetup - https://www.meetup.com/Data-Engineering-San-Diego. You know that feeling when one of your developer friends or colleague tells you about some amazing tool library or shell environment that you never heard of that you just have to run out and try right away This presentation is jam-packed full of those moments for developers data scientists and data engineers. The title says [--] tips but we may veer into many more along the way. I think you'll" [YouTube Link](https://youtube.com/watch?v=hUuYsHUGVX4) 2022-04-21T14:43Z [----] followers, [---] engagements "Synapse Kickstart: Part [--] - Manage Hub The final part of my DataKickstart series for Azure Synapse Analytics. A quick walkthrough of the Manage Hub within Azure Synapse to create and maintain Spark pools SQL pools linked services and more. Plus a wrap up of the series (go to the playlist to see what you missed). Synapse Kickstart playlist: More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse Azure DataKickstart Data Engineering Apache Spark Azure" [YouTube Link](https://youtube.com/watch?v=C3rTuQetX5k) 2022-03-31T12:00Z [----] followers, [--] engagements "Synapse Kickstart: Part [--] - Integrate and Monitor The fourth part in my DataKickstart series for Azure Synapse. In this video I walk through the Integrate Hub and Monitor Hub. This covers how to create a pipeline that runs a Synapse Spark notebook and Synapse Serverless SQL script. It then shows how to monitor running and completed pipelines and Apache Spark applications. The next video in this series will wrap up with how the Manage Hub allows creating and manages resources within Azure Synapse Analytics. More from Dustin: Website: https://dustinvannoy.com LinkedIn:" [YouTube Link](https://youtube.com/watch?v=FsUGBWyPGwM) 2022-03-28T12:00Z [----] followers, [---] engagements "Synapse Kickstart: Part [--] - Develop Hub (Spark/SQL Scripts) The third part in my DataKickstart series for Azure Synapse. In this video I walk through the Develop Hub which includes developing Apache Spark notebooks and SQL scripts for both serverless and dedicated SQL pools. It includes a bit of guidance around using the different capabilities but look out for my other videos that go more in depth. The next video in this series will pick up with scheduling jobs in the Integrate Hub and monitoring with the Monitor Hub. More from Dustin: Website: https://dustinvannoy.com LinkedIn:" [YouTube Link](https://youtube.com/watch?v=96cT1ICZGgQ) 2022-03-22T12:00Z [----] followers, [---] engagements "Data Lifecycle Management with lakeFS - Data Engineering SD Data Lifecycle Management - Applying Engineering Best Practices for Data Presented by Itai David at Data Engineering San Diego meetup on March [--] [----]. Today when working with data lakes over object storage it is difficult to test changes in isolation stage new data pipelines/ML models in parallel to production ensure best practices debug issues or revert in case of a quality issue. lakeFS is an open source project that enables managing data the same way as code. Enabling isolated development safe data ingestion and resilient" [YouTube Link](https://youtube.com/watch?v=oYV0SKWdA28) 2022-03-17T16:01Z [----] followers, [---] engagements "Synapse Kickstart: Part [--] - Data Hub and Querying The second part in my DataKickstart series for Azure Synapse. In this video I introduce the Azure Synapse Analytics Data Hub with explanation of linked datasets lake database and choosing your SQL pool. Keep an eye on the playlist for the next part in this series which will walk through the next part of the Synapse Studio developer experience. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse" [YouTube Link](https://youtube.com/watch?v=pWvNtYluJzE) 2022-03-15T11:00Z [----] followers, [---] engagements "Synapse Kickstart: Part [--] - Overview The first part in my DataKickstart series for Azure Synapse. In this video I introduce some of the core capabilities of Azure Synapse Analytics that stand out to me with a bit of my perspective as a data engineer. Keep an eye on the playlist for the next part in this series which will walk through part of the Synapse Studio developer experience. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse Azure" [YouTube Link](https://youtube.com/watch?v=sckpvXtA24I) 2022-03-10T12:00Z [----] followers, [---] engagements "Scheduling Synapse Spark Notebooks In this video I walk through how to schedule Synapse Spark notebooks to run daily. I also mention other scheduling options for Azure Synapse to help you schedule your data lake loads. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 0:35 Setup Synapse Pipeline 3:00 Add Trigger 5:00 Dependent steps 5:49 Synapse Serverless SQL 8:20 Monitor 9:15 Outro Azure Synapse Apache Spark DataKickstart Data" [YouTube Link](https://youtube.com/watch?v=pzFOlG_lXhg) 2022-02-24T11:00Z [----] followers, [----] engagements "Why is my Azure bill so high A quick look at how I use Cost Analysis within Azure to find which resource is costing me money. In the video I filter to see why Event Hubs cost and virtual networking have increased in recent months. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart azure cost analysis event hubs synapse cost azure cost analysis event hubs synapse cost" [YouTube Link](https://youtube.com/watch?v=fCohH3xC1y4) 2022-02-09T15:58Z [----] followers, [---] engagements "Data Engineering SD: Change Data Capture Session on Change Data Capture for Data Lakes presented by Brian Moore. This was the presentation portion of Data Engineering San Diego meetup group. Change Data Capture (CDC) is a common approach to load data lakes with only data that has changed since the last run. This presentation will share concepts of CDC plus how Upsolver can help. Following the presentation we will have an open group discussion about CDC practices and experiences. Brian Moore is a Senior Data Lake Solutions Architect with Upsolver. He is also a long time member of Data" [YouTube Link](https://youtube.com/watch?v=QM1RgrVya00) 2022-02-07T12:00Z [----] followers, [---] engagements "Azure Synapse: Automated Tests with FluidTest Testing code during development and as part of deployment is a best practice in Data Engineering and Software Development. How do we configure this pattern in Azure Synapse Analytics a powerful but relatively new platform for analytics in the cloud A special guest Mark Cunningham joins to share about automated testing within the Azure Synapse environment with FluidTest -- an open source library he created. The patterns discussed can work with many Azure resources beyond Azure Synapse Pipelines and contributors are welcome to continue extending it." [YouTube Link](https://youtube.com/watch?v=C3dXeFObBfI) 2022-01-28T07:05Z [----] followers, [----] engagements "Synapse Spark: Add External Python Libraries Azure Synapse Spark pools come with all the Anaconda libraries installed but you will likely find a need for additional libraries or different versions of libraries. It is fairly simple when the required libraries exist in a repository like Conda or PyPI (Python Packaging Index). If you build your own Python WHL file (or download one) then you can add those with just a few steps. In this video I show you how to add Python libraries to your Spark pool either with session packages or workspace packages. Check out a written tutorial here:" [YouTube Link](https://youtube.com/watch?v=IlVPpNC0aZY) 2022-01-20T12:00Z [----] followers, [----] engagements "Azure Synapse Spark with Kafka - Installing JAR files My story of how I got Apache Kafka working with Apache Spark on Azure Synapse Analytics Spark pools. This video shows how to use the Manage Packages capabilities of Azure Synapse and which version of the Java libraries (JAR files) actually work currently. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro / Use Case 2:00 Spark Pool setup 2:32 Managed Packages 3:50 Run and Validate 4:54" [YouTube Link](https://youtube.com/watch?v=JKhxAZzZhJE) 2021-12-17T00:44Z [----] followers, [----] engagements "Azure Stream Analytics with Event Hubs In this video I introduce Azure Stream Analytics by walking through building a Stream Analytics Job from scratch using Azure Event Hubs as the input and output. This video includes all the setup steps and should have enough detail for you to get your own job setup for stream processing with Event Hubs and Stream Analytics. More from Dustin: Website: https://dustinvannoy.com Twitter: https://twitter.com/dustinvannoy LinkedIn: https://www.linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 1:03 Create Job 2:34 Create" [YouTube Link](https://youtube.com/watch?v=83e0HCmLFfY) 2021-11-05T13:00Z [----] followers, [----] engagements "Data Engineering SD: Intro to Great Expectations by Allen Sallinger This user group presentation by Allen Sallinger will introduce you to the data quality tool Great Expectations (https://greatexpectations.io/). Listen in as he shows us how to use Great Expectations and some of the team from Superconductive answers our questions. Join us for the next meetup: https://www.meetup.com/Data-Engineering-San-Diego/ Data Engineering Great Expectations Data Quality Data Engineering SD Data Engineering Great Expectations Data Quality Data Engineering SD" [YouTube Link](https://youtube.com/watch?v=xg7kkD_ENS8) 2021-10-18T18:25Z [----] followers, [---] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@dustinvannoy Dustin VannoyDustin Vannoy posts on YouTube about azure, databricks, engineering, apache the most. They currently have [-----] followers and [--] posts still getting attention that total [---] engagements in the last [--] hours.
Social category influence technology brands finance social networks stocks events
Social topic influence azure, databricks #53, engineering, apache, how to, in the, ai, this is, tutorial, san diego
Top accounts mentioned or mentioned by @24chynowethdatabrickssystemtablesanintroductione11a06872405 @abrahampabbathiintegratingmicrosoftfabricwithdatabricksf2203f65b224
Top assets mentioned Microsoft Corp. (MSFT) Alphabet Inc Class A (GOOGL)
Top posts by engagements in the last [--] hours
"dbt + Databricks Overview: SQL-based ETL In this video I introduce dbt and how it integrates with Databricks to support SQL based ETL. This video is to teach you: [--]. Why dbt is helpful for building SQL based data pipelines on any Data Warehouse platform. [--]. The basics of how you use dbt with Databricks. [--]. A few key benefits of using Databricks. All thoughts and opinions are my own References: dbt Cloud vs dbt Core: https://www.getdbt.com/product/dbt-core-vs-dbt-cloud dbt on Lakehouse Design Patterns:"
YouTube Link 2025-05-21T12:01Z [----] followers, 12.4K engagements
"Azure Stream Analytics with Event Hubs In this video I introduce Azure Stream Analytics by walking through building a Stream Analytics Job from scratch using Azure Event Hubs as the input and output. This video includes all the setup steps and should have enough detail for you to get your own job setup for stream processing with Event Hubs and Stream Analytics. More from Dustin: Website: https://dustinvannoy.com Twitter: https://twitter.com/dustinvannoy LinkedIn: https://www.linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 1:03 Create Job 2:34 Create"
YouTube Link 2021-11-05T13:00Z [----] followers, [----] engagements
"7 Best Practices for Development and CICD on Databricks In this video I share why developer experience and best practices are important and why I think Databricks offers the best developer experience for a data platform. I'll cover high level developer lifecycle and [--] ways to improve your team's development process with a goal of better quality and reliability. Stay tuned for follow up videos that cover some of the key topics discussed here. Blog post with more details and deep dives: Coming soon * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com"
YouTube Link 2024-12-20T13:38Z [----] followers, [----] engagements
"Monitoring Databricks with System Tables In this video I focus on a different side of monitoring: What do the Databricks system tables offer me for monitoring How much does this overlap with the application logs and Spark metrics Databricks System Tables are a public preview feature that can be enabled if you have Unity Catalog on your workspace. I introduce the concept in the first [--] minutes then summarize where this is most helpful in the last [--] minutes. In between are some example queries and table explanations. * All thoughts and opinions are my own * Blog post with more detail:"
YouTube Link 2024-02-22T15:20Z [----] followers, [----] engagements
"Databricks Asset Bundles: Advanced Examples Databricks Asset Bundles is now GA (Generally Available). As more Databricks users start to rely on Databricks Asset Bundles (DABs) for their development and deployment workflows let's look at some advanced patterns people have been asking for examples to help them get started. Blog post with these examples: https://dustinvannoy.com/2024/06/25/databricks-asset-bundles-advanced Intro post: https://dustinvannoy.com/2023/10/03/databricks-ci-cd-intro-to-asset-bundles-dabs * All thoughts and opinions are my own * References: Datakickstart DABs repo:"
YouTube Link 2024-06-25T12:00Z [----] followers, 17.3K engagements
"Azure Synapse Spark Monitoring with Log Analytics Log Analytics provides a way to easily query logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through setting up Azure Synapse Apache Spark to connect to Log Analytics adding custom log messages from PySpark and how to query logs from Spark log4j using Azure Log Analytics. Written tutorial and troubleshooting steps: https://dustinvannoy.com/2022/05/12/monitor-synapse-spark-with-log-analytics/ More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github:"
YouTube Link 2022-05-13T13:30Z [----] followers, [----] engagements
"Databricks VS Code Extension: Serverless Compute The Databricks Extension for Visual Studio Code now supports connecting to Serverless compute when running with Databricks Connect or running as a workflow. This video provides a quick explanation of when this is a good option and a short demo of it in action. All thoughts and opinions are my own More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart Databricks Azure Databricks VS Code Visual Studio IDE Developer Experience Databricks Connect Serverless"
YouTube Link 2025-02-28T13:00Z [----] followers, [----] engagements
"Azure Synapse Spark Scala In this video I share with you about Apache Spark using Scala. We'll walk through a quick demo on Azure Synapse Analytics an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at Spark in Azure Synapse. If you are new to Apache Spark just know that it is a popular framework for data engineers that can be run in a variety of environments. It is popular because it enables distributed data processing with a relatively simple API. If you want to see examples"
YouTube Link 2021-02-04T13:00Z [----] followers, [----] engagements
"Introducing DBRX Open LLM - Data Engineering San Diego (May 2024) A special event presented by Data Engineering San Diego Databricks User Group and San Diego Software Engineers. Presentation: Introducing DBRX - Open LLM by Databricks By: Vitaliy Chiley Head of LLM Pretraining for Mosaic at Databricks DBRX is an open-source LLM by Databricks which when recently released outperformed established open-source models on a set of standard benchmarks. Join us to learn firsthand about how the Mosaic Research team built DBRX and why it matters. This talk will cover the architecture model evaluation"
YouTube Link 2024-06-04T23:03Z [----] followers, [---] engagements
"Databricks CI/CD: Azure DevOps Pipeline + DABs Many organizations choose Azure DevOps for automated deployments on Azure. When deploying to Databricks you can take similar deploy pipeline code that you use for other projects but use it with Databricks Asset Bundles. This video shows most of the steps involved in setting this up by following along with a blog post that shares example code and steps. * All thoughts and opinions are my own * Blog post on DABs with Azure DevOps:"
YouTube Link 2024-08-19T14:30Z [----] followers, 28.4K engagements
"Spark Monitoring: Basics In this video I cover how to use the default UI for monitoring Apache Spark. I use Azure Databricks to demonstrate but most of the methods are the same in any Spark environment. I focus most on how I use it in my day to day work as a Data Engineer. Topics included: Spark UI query plan basic Ganglia Metrics and driver logs. Monitoring Streams with Custom Streaming Query Listener: https://youtu.be/iqIdmCvSwwUt=268 More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github: https://github.com/datakickstart"
YouTube Link 2021-07-26T15:16Z [----] followers, 19.7K engagements
"Claude Code: [--] Essentials for Data Engineering This video is a guide for data professionals (Data Engineers Data Scientists and Analytics Engineers) on adopting AI for development using Claude Code. The main idea: success with Claude Code comes down to managing context and memory well. Claude Code works as an agentit handles multi-file edits runs tests and fixes errors pretty autonomously. The [--] essentials you should understand about Claude Code: [--]. claude.md - Your main instruction file. It tells Claude your project's rules frameworks and design principles from the start. [--]. Skills -"
YouTube Link 2026-01-08T14:01Z [----] followers, [----] engagements
"Databricks VS Code Extension v2: Setup and Feature Demo Databricks Visual Studio Code Extension v2 the next major release is now generally available. In this video I walk through the initial setup and the main ways you will run code and deploy resources using this extension. I also provide some key tips to make sure you don't get stuck along the way. * All thoughts and opinions are my own * References: Databricks blog: https://www.databricks.com/blog/simplified-faster-development-new-capabilities-databricks-vs-code-extension Databricks Asset Bundle training:"
YouTube Link 2024-09-26T12:00Z [----] followers, 19.2K engagements
"Data + AI Summit 2023: Key Takeaways Data + AI Summit key takeaways from a Data Engineers perspective. Which features coming to Apache Spark and to Databricks are most exciting for data engineering I cover that plus a decent amount of AI and LLM talk in this informal video. See the blog post for a bit more thought out summaries and links to many of the keynote demos related to the features I am excited about. Blog post: https://dustinvannoy.com/2023/06/30/dais-2023-data-engineer-takeaways/ * All thoughts and opinions are my own * Databricks Apache Spark Azure Databricks DataAISummit Data AI"
YouTube Link 2023-07-01T13:35Z [----] followers, [---] engagements
"Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial The tech industry has evolved rapidly and AI coding tools are changing how we develop. These tools have amazing potential to speed up the development process for data professionals building Databricks focused projects. While we see advanced AI features in the online Databricks workspace there are also amazing capabilities in tools you install on your development machine such as Cursor and Claude Code. In this video I explain recommendations use Cursor with Databricks. This includes using some development tools that I have shared about"
YouTube Link 2025-09-30T12:00Z [----] followers, [----] engagements
"Parallel table ingestion with a Spark Notebook (PySpark + Threading) If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big it is quicker to have Spark load tables concurrently (in parallel) using multithreading. There are some different options of how to do this but I am sharing the easiest way I have found when working with a PySpark notebook in Databricks Azure Synapse Spark Jupyter or"
YouTube Link 2022-05-06T14:20Z [----] followers, 17.3K engagements
"Databricks CI/CD: Intro to Databricks Asset Bundles (DABs) Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks Python code Delta Live Tables pipelines and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs. Blog post with extra examples:"
YouTube Link 2023-10-04T12:00Z [----] followers, 40.2K engagements
"Claude Code: [--] Essentials for Data Engineering This video is a guide for data professionals (Data Engineers Data Scientists and Analytics Engineers) on adopting AI for development using Claude Code. The main idea: success with Claude Code comes down to managing context and memory well. Claude Code works as an agentit handles multi-file edits runs tests and fixes errors pretty autonomously. The [--] essentials you should understand about Claude Code: [--]. claude.md - Your main instruction file. It tells Claude your project's rules frameworks and design principles from the start. [--]. Skills -"
YouTube Link 2026-01-08T14:01Z [----] followers, [----] engagements
"Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial The tech industry has evolved rapidly and AI coding tools are changing how we develop. These tools have amazing potential to speed up the development process for data professionals building Databricks focused projects. While we see advanced AI features in the online Databricks workspace there are also amazing capabilities in tools you install on your development machine such as Cursor and Claude Code. In this video I explain recommendations use Cursor with Databricks. This includes using some development tools that I have shared about"
YouTube Link 2025-09-30T12:00Z [----] followers, [----] engagements
"dbt + Databricks Overview: SQL-based ETL In this video I introduce dbt and how it integrates with Databricks to support SQL based ETL. This video is to teach you: [--]. Why dbt is helpful for building SQL based data pipelines on any Data Warehouse platform. [--]. The basics of how you use dbt with Databricks. [--]. A few key benefits of using Databricks. All thoughts and opinions are my own References: dbt Cloud vs dbt Core: https://www.getdbt.com/product/dbt-core-vs-dbt-cloud dbt on Lakehouse Design Patterns:"
YouTube Link 2025-05-21T12:01Z [----] followers, 12.4K engagements
"Databricks VS Code Extension: Serverless Compute The Databricks Extension for Visual Studio Code now supports connecting to Serverless compute when running with Databricks Connect or running as a workflow. This video provides a quick explanation of when this is a good option and a short demo of it in action. All thoughts and opinions are my own More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart Databricks Azure Databricks VS Code Visual Studio IDE Developer Experience Databricks Connect Serverless"
YouTube Link 2025-02-28T13:00Z [----] followers, [----] engagements
"Unity Catalog OSS Spotlight Unity Catalog Open Source Software (OSS) is a compelling project and there are some key benefits to working with it locally. In this video I share reason for using the open source project Unity Catalog (UC) and walk through some of the setup and testing I did to create and write to tables from Apache Spark. All thoughts and opinions are my own Links: Blog post - https://dustinvannoy.com/2025/01/30/oss-spotlight-unity-catalog Roadmap 2024Q4 - https://github.com/unitycatalog/unitycatalog/discussions/411 Events - https://lu.ma/unity-catalog More from Dustin:"
YouTube Link 2025-01-30T13:00Z [----] followers, [----] engagements
"Developer Best Practices on Databricks: Git Tests and Automated Deployment Data engineers and data scientists benefit from using best practices learned from years of software development. This video walks through [--] of the most important practices to build quality analytics solutions. It is meant to be an overview of what following these practices looks like for a Databricks developer. This video covers: - Version control basics and demo of Git integration with Databricks workspace - Automated tests with pytest for unit testing and Databricks Workflows for integration testing - CI/CD including"
YouTube Link 2025-01-06T13:00Z [----] followers, 14.9K engagements
"7 Best Practices for Development and CICD on Databricks In this video I share why developer experience and best practices are important and why I think Databricks offers the best developer experience for a data platform. I'll cover high level developer lifecycle and [--] ways to improve your team's development process with a goal of better quality and reliability. Stay tuned for follow up videos that cover some of the key topics discussed here. Blog post with more details and deep dives: Coming soon * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com"
YouTube Link 2024-12-20T13:38Z [----] followers, [----] engagements
"Databricks VS Code: Multiple Projects In VS Code Workspace In this video I cover a specific option for work with Databricks Visual Studio Code Extensionwhat it I have many project folders each as their own bundle but I want to work in the same VS Code workspace I talk through a couple ways to work with this and show how to switch the active project folder in order to run files from different bundles. You may need this if: - VS Code is only opening one Databricks project but you want multiple open in the same session. - You are getting error on Python script like "Error: init: listener: timed"
YouTube Link 2024-10-15T14:36Z [----] followers, [----] engagements
"Databricks VS Code Extension v2: Upgrade steps In this short video I show you how to upgrade a project from using Databricks Visual Studio Code version [--] to using the new version. There are a few key setup steps included and a quick glimpse at the new Databricks run button. For a more complete view of using the Databricks Visual Studio Code extension see this video: https://www.youtube.com/watchv=o4qMWHgT1zM * All thoughts and opinions are my own * More from Dustin: Website: https://dustinvannoy.com LinkedIn: https:/linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart"
YouTube Link 2024-10-02T14:35Z [----] followers, [---] engagements
"Databricks VS Code Extension v2: Setup and Feature Demo Databricks Visual Studio Code Extension v2 the next major release is now generally available. In this video I walk through the initial setup and the main ways you will run code and deploy resources using this extension. I also provide some key tips to make sure you don't get stuck along the way. * All thoughts and opinions are my own * References: Databricks blog: https://www.databricks.com/blog/simplified-faster-development-new-capabilities-databricks-vs-code-extension Databricks Asset Bundle training:"
YouTube Link 2024-09-26T12:00Z [----] followers, 19.2K engagements
"Databricks CI/CD: Azure DevOps Pipeline + DABs Many organizations choose Azure DevOps for automated deployments on Azure. When deploying to Databricks you can take similar deploy pipeline code that you use for other projects but use it with Databricks Asset Bundles. This video shows most of the steps involved in setting this up by following along with a blog post that shares example code and steps. * All thoughts and opinions are my own * Blog post on DABs with Azure DevOps:"
YouTube Link 2024-08-19T14:30Z [----] followers, 28.4K engagements
"Databricks Asset Bundles: Advanced Examples Databricks Asset Bundles is now GA (Generally Available). As more Databricks users start to rely on Databricks Asset Bundles (DABs) for their development and deployment workflows let's look at some advanced patterns people have been asking for examples to help them get started. Blog post with these examples: https://dustinvannoy.com/2024/06/25/databricks-asset-bundles-advanced Intro post: https://dustinvannoy.com/2023/10/03/databricks-ci-cd-intro-to-asset-bundles-dabs * All thoughts and opinions are my own * References: Datakickstart DABs repo:"
YouTube Link 2024-06-25T12:00Z [----] followers, 17.3K engagements
"Introducing DBRX Open LLM - Data Engineering San Diego (May 2024) A special event presented by Data Engineering San Diego Databricks User Group and San Diego Software Engineers. Presentation: Introducing DBRX - Open LLM by Databricks By: Vitaliy Chiley Head of LLM Pretraining for Mosaic at Databricks DBRX is an open-source LLM by Databricks which when recently released outperformed established open-source models on a set of standard benchmarks. Join us to learn firsthand about how the Mosaic Research team built DBRX and why it matters. This talk will cover the architecture model evaluation"
YouTube Link 2024-06-04T23:03Z [----] followers, [---] engagements
"Monitoring Databricks with System Tables In this video I focus on a different side of monitoring: What do the Databricks system tables offer me for monitoring How much does this overlap with the application logs and Spark metrics Databricks System Tables are a public preview feature that can be enabled if you have Unity Catalog on your workspace. I introduce the concept in the first [--] minutes then summarize where this is most helpful in the last [--] minutes. In between are some example queries and table explanations. * All thoughts and opinions are my own * Blog post with more detail:"
YouTube Link 2024-02-22T15:20Z [----] followers, [----] engagements
"Databricks Monitoring with Log Analytics - Updated for DBR 11.3+ In this video I show the latest way to setup and use Log Analytics for storing and querying you Databricks logs. My prior video covered the steps for earlier Databricks Runtime Versions (prior to 11.0). This video covers using the updated code for Databricks Runtime [----] [----] or [----]. There are various options for monitoring Databricks but since Log Analytics provides a way to easily query logs and setup alerts in Azure you may choose to send your Databricks logs there as well. * All thoughts and opinions are my own * Blog post"
YouTube Link 2024-01-08T13:00Z [----] followers, [----] engagements
"Databricks CI/CD: Intro to Databricks Asset Bundles (DABs) Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks Python code Delta Live Tables pipelines and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs. Blog post with extra examples:"
YouTube Link 2023-10-04T12:00Z [----] followers, 40.2K engagements
"Data + AI Summit 2023: Key Takeaways Data + AI Summit key takeaways from a Data Engineers perspective. Which features coming to Apache Spark and to Databricks are most exciting for data engineering I cover that plus a decent amount of AI and LLM talk in this informal video. See the blog post for a bit more thought out summaries and links to many of the keynote demos related to the features I am excited about. Blog post: https://dustinvannoy.com/2023/06/30/dais-2023-data-engineer-takeaways/ * All thoughts and opinions are my own * Databricks Apache Spark Azure Databricks DataAISummit Data AI"
YouTube Link 2023-07-01T13:35Z [----] followers, [---] engagements
"PySpark Kickstart - Read and Write Data with Apache Spark Every Spark pipeline involves reading data from a data source or table and often ends with writing data. In this video we walk through some of the most common formats and cloud storage used for reading and writing with Spark. Includes some guidance on authenticating to ADLS OneLake S3 Google Cloud Storage Azure SQL Database and Snowflake. Once you have watched this tutorial go find a free dataset and try to read and write within your environment. * All thoughts and opinions are my own * For links to the code and more information on"
YouTube Link 2023-06-22T12:00Z [----] followers, [----] engagements
"Data Engineering San Diego - Intro to dbt Data Engineering San Diego group monthly meeting for a presentation followed by group questions and responses. We will start the stream as close to 5:30 as we can. dbt - Introduction and demo As a data practitioner have you ever lost trust from a stakeholder because of inaccurate data on a dashboard Could that have been fixed by testing that the primary keys were actually unique Are you a sql analyst that is intimately familiar with the business logic needed for a use case but are dependent on someone else to build the pipeline for you In this session"
YouTube Link 2023-05-19T13:58Z [----] followers, [---] engagements
"Spark SQL Kickstart: Your first Spark SQL application Get hands on with Spark SQL to build your first data pipeline. In this video I walk you through how to read transform and write the NYC Taxi dataset which can be found on Databricks Azure Synapse or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial go find a free dataset and try to write your own Spark application. * All thoughts and opinions are my own * For links to the code and more information on this course you can visit my website:"
YouTube Link 2023-05-18T12:00Z [----] followers, [----] engagements
"PySpark Kickstart - Your first Apache Spark data pipeline Get hands on with Python and PySpark to build your first data pipeline. In this video I walk you through how to read transform and write the NYC Taxi dataset which can be found on Databricks Azure Synapse or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial go find a free dataset and try to write your own PySpark application. Pro tip: Search for the Spark equivalent of functions you use in other programming languages (including SQL). Many will exist in the"
YouTube Link 2023-05-02T11:00Z [----] followers, [----] engagements
"Data Engineering San Diego - Intro to Large Language Models Special edition of the Data Engineering group where Data Engineering San Diego partnered with San Diego Software Engineers for an event. * We had a lot of in-person attendees and a lot of conversation so the mic could not be situated to catch all of the audio well. We try to make the livestream helpful for those that cannot attend in person but the room mics do not pick up enough conversation to provide as good of a virtual experience as we would like. * Introduction to Large Language Models for Software/Data Engineers Large Language"
YouTube Link 2023-04-26T14:47Z [----] followers, [---] engagements
"Spark Environment - Azure Databricks Trial In this video I cover how to setup a free Azure Trial and spin up a free Azure Databricks Trial. This is a great way to have an option for testing out Databricks and learning Apache Spark on Azure. Once setup you will see how to run a very simple test notebook. * All thoughts and opinions are my own * Additional links: Setup Databricks on AWS - https://youtu.be/gEDS5DOUgY8 Setup Databricks on Google Cloud - https://youtu.be/hquLYNN8nz8 Azure Databricks setup documentation: https://learn.microsoft.com/en-us/azure/databricks/getting-started/ More from"
YouTube Link 2023-04-19T11:00Z [----] followers, [---] engagements
"Spark Environment - Databricks Community Edition In this video I cover how to setup a free Databricks community edition environment. This is a great way to have an option for testing out Databricks and learning Apache Spark and it doesnt expire after [--] days. It is limited functionality and scalability though so you wont be able to run a realistic proof of concept on this environment. Once setup you will see how to run a very simple test notebook. If you want to see how to setup a more a trial or more permanent Databricks environment in Azure see my other video - Coming soon * All thoughts"
YouTube Link 2023-04-11T13:17Z [----] followers, [----] engagements
"Apache Spark DataKickstart - Introduction to Spark In this video I provide introduction to Apache Spark as part of my YouTube course Apache Spark DataKickstart. This video covers why Spark is popular what it really is and a bit about ways to run Apache Spark. Please check out other videos in this series by selecting the relevant playlist or subscribe and turn on notifications for new videos (coming soon). * All thoughts and opinions are my own * Learn more about Apache Spark with one of these options (including a few paid options): Databricks Training -"
YouTube Link 2023-03-23T12:59Z [----] followers, [----] engagements
"Unity Catalog setup for Azure Databricks In this video I walk through setting up Unity Catalog on Azure and quickly exploring the cataloging features for a couple tables with a workflow. This includes setting up storage and access connector then a quick walk through of lineage and other metadata tracked at the table level. * All thoughts and opinions are my own * Learn more about Unity Catalog with one of the below videos: Short intro from Databricks - https://www.youtube.com/watchv=U1Ez5LNzl48 Data Lineage - https://www.youtube.com/watchv=8wGUnXhISz0 Deeper dive introduction -"
YouTube Link 2023-03-01T12:00Z [----] followers, 17.2K engagements
"Visual Studio Code Extension for Databricks In this video I show how I installed configured and tested out the VS Code extension for Azure Databricks. This provides a way to develop PySpark code in your Visual Studio Code IDE and run the code on a Databricks cluster. It works well with Databricks Git Repos so you can keep your team in sync whether they work in VS Code or in Notebooks on the Databricks workspace. IMPORTANT UPDATE to how I explained this in the video: The repo used for syncing from local will not be an existing Databricks repo if using the update version (0.3.0+). This is to"
YouTube Link 2023-02-21T12:00Z [----] followers, 20.4K engagements
"Parallel Load in Spark Notebook - Questions Answered In this video I address questions from the tutorial I did on Parallel Table Ingestion with Spark Notebooks. See the chapters outline below for which questions are addressed. Original video that questions came from: https://dustinvannoy.com/2022/05/06/parallel-ingest-spark-notebook/ Q&A Writeup: https://dustinvannoy.com/2023/01/09/questions-answered-parallel-spark-notebook Running in Scala: https://docs.databricks.com/notebooks/noteblook-workflows.html#run-multiple-notebooks-concurrently More from Dustin: Website: https://dustinvannoy.com"
YouTube Link 2023-01-09T14:32Z [----] followers, [----] engagements
"Delta Change Feed and Delta Merge pipeline (extended demo) This video shows an extended demo a pipeline that loads refined (silver) and curated (gold) tables. It complements the demo from the session "Data Ingestion - Practical Data Loading with Azure" that was part of PASS Data Community Summity [----]. It shows use of the Delta Lake Change Data Feed and the Merge command to track and process only inserts updates and deletes. Related content: Concurrent data ingestion in Spark notebooks - https://youtu.be/hFGu2enSjTY Databricks blog on change data feed -"
YouTube Link 2022-11-16T17:04Z [----] followers, [----] engagements
"Data Engineering SD: Rise of Immediate Intelligence - Apache Druid Presenter: Sergio Ferragut - Imply Decision making is changing: Apache Druid is a new type of database for creating the next generation of analytics applications that maximize flexible exploration over fresh fast-arriving data. In this talk Sergio Ferragut (Developer Advocate in Imply's Community Team) introduces these new "immediate intelligence" applications tells the story of Druid's emergence and describes how data pipelines built with Druid differ from those you may already be familiar with. Apache Druid Imply Data"
YouTube Link 2022-07-22T13:55Z [----] followers, [---] engagements
"Azure Synapse integration with Microsoft Purview data catalog In this video I show how to populate the Microsoft Purview data map and data catalog with metadata about Azure Synapse Analytics datasets. I cover brief explanation of these two important parts of Microsoft Purview then show how to setup permissions and ingest metadata about tables in Synapse Serverless SQL and Synapse Dedicated SQL pools. NOTE: This will cost money to have the Purview resource and to run scans. You are responsible to look at possible costs before running in your environment. I spent at least $30 on Purview related"
YouTube Link 2022-07-19T12:00Z [----] followers, [----] engagements
"Adi Polak - Chaos Engineering - Managing Stages in a Complex Data Flow - Data Engineering SD Chaos Engineering and how to manage data stages in Large-Scale Complex Data Flow Presenter: Adi Polak A complex data flow is a set of operations to extract information from multiple sources copy them into multiple data targets while using extract transformations joins filters and sorts to refine the results. These are precisely the capabilities that the new open modern data stack provides us. Spark and other tools allow us to develop complex data flow on large-scale data. Chaos Engineering concepts"
YouTube Link 2022-06-21T12:30Z [----] followers, [---] engagements
"Azure Synapse Spark Monitoring with Log Analytics Log Analytics provides a way to easily query logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through setting up Azure Synapse Apache Spark to connect to Log Analytics adding custom log messages from PySpark and how to query logs from Spark log4j using Azure Log Analytics. Written tutorial and troubleshooting steps: https://dustinvannoy.com/2022/05/12/monitor-synapse-spark-with-log-analytics/ More from Dustin: Website: dustinvannoy.com Twitter: @dustinvannoy Github:"
YouTube Link 2022-05-13T13:30Z [----] followers, [----] engagements
"Parallel table ingestion with a Spark Notebook (PySpark + Threading) If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big it is quicker to have Spark load tables concurrently (in parallel) using multithreading. There are some different options of how to do this but I am sharing the easiest way I have found when working with a PySpark notebook in Databricks Azure Synapse Spark Jupyter or"
YouTube Link 2022-05-06T14:20Z [----] followers, 17.3K engagements
"SQL Server On Docker + deploy DB to Azure In this video I show how to setup SQL Server on Docker attached a database from local mdf file and then deploy to an Azure SQL database. What I like about using docker for this work is that I can script out the setup and only keep the container and image around when I need it. In the video I set it up using a Windows [--] laptop but the commands shown can work on Mac or Linux. Written tutorial and links to referenced sites: https://dustinvannoy.com/2022/04/26/sql-server-on-docker/ More from Dustin: Website: https://dustinvannoy.com LinkedIn:"
YouTube Link 2022-04-27T03:56Z [----] followers, [----] engagements
"Michael Kennedy - [--] tips for developers and data scientists - Data Engineering SD Michael Kennedy presenting at Data Engineering San Diego meetup - https://www.meetup.com/Data-Engineering-San-Diego. You know that feeling when one of your developer friends or colleague tells you about some amazing tool library or shell environment that you never heard of that you just have to run out and try right away This presentation is jam-packed full of those moments for developers data scientists and data engineers. The title says [--] tips but we may veer into many more along the way. I think you'll"
YouTube Link 2022-04-21T14:43Z [----] followers, [---] engagements
"Synapse Kickstart: Part [--] - Manage Hub The final part of my DataKickstart series for Azure Synapse Analytics. A quick walkthrough of the Manage Hub within Azure Synapse to create and maintain Spark pools SQL pools linked services and more. Plus a wrap up of the series (go to the playlist to see what you missed). Synapse Kickstart playlist: More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse Azure DataKickstart Data Engineering Apache Spark Azure"
YouTube Link 2022-03-31T12:00Z [----] followers, [--] engagements
"Synapse Kickstart: Part [--] - Integrate and Monitor The fourth part in my DataKickstart series for Azure Synapse. In this video I walk through the Integrate Hub and Monitor Hub. This covers how to create a pipeline that runs a Synapse Spark notebook and Synapse Serverless SQL script. It then shows how to monitor running and completed pipelines and Apache Spark applications. The next video in this series will wrap up with how the Manage Hub allows creating and manages resources within Azure Synapse Analytics. More from Dustin: Website: https://dustinvannoy.com LinkedIn:"
YouTube Link 2022-03-28T12:00Z [----] followers, [---] engagements
"Synapse Kickstart: Part [--] - Develop Hub (Spark/SQL Scripts) The third part in my DataKickstart series for Azure Synapse. In this video I walk through the Develop Hub which includes developing Apache Spark notebooks and SQL scripts for both serverless and dedicated SQL pools. It includes a bit of guidance around using the different capabilities but look out for my other videos that go more in depth. The next video in this series will pick up with scheduling jobs in the Integrate Hub and monitoring with the Monitor Hub. More from Dustin: Website: https://dustinvannoy.com LinkedIn:"
YouTube Link 2022-03-22T12:00Z [----] followers, [---] engagements
"Data Lifecycle Management with lakeFS - Data Engineering SD Data Lifecycle Management - Applying Engineering Best Practices for Data Presented by Itai David at Data Engineering San Diego meetup on March [--] [----]. Today when working with data lakes over object storage it is difficult to test changes in isolation stage new data pipelines/ML models in parallel to production ensure best practices debug issues or revert in case of a quality issue. lakeFS is an open source project that enables managing data the same way as code. Enabling isolated development safe data ingestion and resilient"
YouTube Link 2022-03-17T16:01Z [----] followers, [---] engagements
"Synapse Kickstart: Part [--] - Data Hub and Querying The second part in my DataKickstart series for Azure Synapse. In this video I introduce the Azure Synapse Analytics Data Hub with explanation of linked datasets lake database and choosing your SQL pool. Keep an eye on the playlist for the next part in this series which will walk through the next part of the Synapse Studio developer experience. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse"
YouTube Link 2022-03-15T11:00Z [----] followers, [---] engagements
"Synapse Kickstart: Part [--] - Overview The first part in my DataKickstart series for Azure Synapse. In this video I introduce some of the core capabilities of Azure Synapse Analytics that stand out to me with a bit of my perspective as a data engineer. Keep an eye on the playlist for the next part in this series which will walk through part of the Synapse Studio developer experience. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart Azure Synapse Azure"
YouTube Link 2022-03-10T12:00Z [----] followers, [---] engagements
"Scheduling Synapse Spark Notebooks In this video I walk through how to schedule Synapse Spark notebooks to run daily. I also mention other scheduling options for Azure Synapse to help you schedule your data lake loads. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 0:35 Setup Synapse Pipeline 3:00 Add Trigger 5:00 Dependent steps 5:49 Synapse Serverless SQL 8:20 Monitor 9:15 Outro Azure Synapse Apache Spark DataKickstart Data"
YouTube Link 2022-02-24T11:00Z [----] followers, [----] engagements
"Why is my Azure bill so high A quick look at how I use Cost Analysis within Azure to find which resource is costing me money. In the video I filter to see why Event Hubs cost and virtual networking have increased in recent months. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart azure cost analysis event hubs synapse cost azure cost analysis event hubs synapse cost"
YouTube Link 2022-02-09T15:58Z [----] followers, [---] engagements
"Data Engineering SD: Change Data Capture Session on Change Data Capture for Data Lakes presented by Brian Moore. This was the presentation portion of Data Engineering San Diego meetup group. Change Data Capture (CDC) is a common approach to load data lakes with only data that has changed since the last run. This presentation will share concepts of CDC plus how Upsolver can help. Following the presentation we will have an open group discussion about CDC practices and experiences. Brian Moore is a Senior Data Lake Solutions Architect with Upsolver. He is also a long time member of Data"
YouTube Link 2022-02-07T12:00Z [----] followers, [---] engagements
"Azure Synapse: Automated Tests with FluidTest Testing code during development and as part of deployment is a best practice in Data Engineering and Software Development. How do we configure this pattern in Azure Synapse Analytics a powerful but relatively new platform for analytics in the cloud A special guest Mark Cunningham joins to share about automated testing within the Azure Synapse environment with FluidTest -- an open source library he created. The patterns discussed can work with many Azure resources beyond Azure Synapse Pipelines and contributors are welcome to continue extending it."
YouTube Link 2022-01-28T07:05Z [----] followers, [----] engagements
"Synapse Spark: Add External Python Libraries Azure Synapse Spark pools come with all the Anaconda libraries installed but you will likely find a need for additional libraries or different versions of libraries. It is fairly simple when the required libraries exist in a repository like Conda or PyPI (Python Packaging Index). If you build your own Python WHL file (or download one) then you can add those with just a few steps. In this video I show you how to add Python libraries to your Spark pool either with session packages or workspace packages. Check out a written tutorial here:"
YouTube Link 2022-01-20T12:00Z [----] followers, [----] engagements
"Azure Synapse Spark with Kafka - Installing JAR files My story of how I got Apache Kafka working with Apache Spark on Azure Synapse Analytics Spark pools. This video shows how to use the Manage Packages capabilities of Azure Synapse and which version of the Java libraries (JAR files) actually work currently. More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Twitter: https://twitter.com/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro / Use Case 2:00 Spark Pool setup 2:32 Managed Packages 3:50 Run and Validate 4:54"
YouTube Link 2021-12-17T00:44Z [----] followers, [----] engagements
"Azure Stream Analytics with Event Hubs In this video I introduce Azure Stream Analytics by walking through building a Stream Analytics Job from scratch using Azure Event Hubs as the input and output. This video includes all the setup steps and should have enough detail for you to get your own job setup for stream processing with Event Hubs and Stream Analytics. More from Dustin: Website: https://dustinvannoy.com Twitter: https://twitter.com/dustinvannoy LinkedIn: https://www.linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart CHAPTERS: 0:00 Intro 1:03 Create Job 2:34 Create"
YouTube Link 2021-11-05T13:00Z [----] followers, [----] engagements
"Data Engineering SD: Intro to Great Expectations by Allen Sallinger This user group presentation by Allen Sallinger will introduce you to the data quality tool Great Expectations (https://greatexpectations.io/). Listen in as he shows us how to use Great Expectations and some of the team from Superconductive answers our questions. Join us for the next meetup: https://www.meetup.com/Data-Engineering-San-Diego/ Data Engineering Great Expectations Data Quality Data Engineering SD Data Engineering Great Expectations Data Quality Data Engineering SD"
YouTube Link 2021-10-18T18:25Z [----] followers, [---] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/youtube::dustinvannoy