Azure Databricks and Spark SQL (Python) Description
As careers in the rising data analytics field increase, the demand for tools dedicated to big data such as Databricks and Spark also takes off as well as the need of understanding them well for practical uses. That’s where Azure Databricks and Spark SQL (Python) by Malvik Vaghadia comes in handy.
The course ensures learners clearly get what it has to offer by delivering the knowledge in both engaging and concise ways, even on technical-heavy applications such as PySpark, Spark SQL in Python, and Databricks Lakehouse Architecture.
Here are what you will learn in this course:
- Course Overview / Introduction to Spark and Databricks
- Course Introduction
- Big Data
- Hadoop, Spark, and Databricks
- Apache Spark Architecture
- Spark vs Databricks Comparison
- Resource: Comparing Apache Spark vs Databricks
- Azure and Databricks Set Up
- Azure Account Set Up
- Azure UI Overview
- Resource: Azure Resources
- Creating your Databricks Service
- Databricks UI Overview
- Clusters
- Resource: Pricing, Cluster Pools, and Runtime Versions
- How to use Databricks Notebooks
- Mix Languages and add Markdown text in your Notebook
- Databricks Utilities Module and FileStore Utilities
- Resource: How to use Notebooks
- IMPORTANT – Download Course Resource Notebooks
- Cost Management and Cancelling your Subscription
- Resource: Cancelling your Azure Subscription
- Reading and Writing Data
- Dataset Download
- Databricks FileStore
- Resource: File Types
- Reading Data
- Writing Data
- Parquet Files
- Deleting Files and Folders
- Data Analysis and Transformation with SparkSQL
- Selecting and Renaming Columns
- Adding New Columns
- Changing Data Types
- Math Functions and Simple Arithmetic
- Sort Functions
- String Functions
- Datetime Functions
- Filtering DataFrames
- Conditional Statements
- Using SQL Expressions with expr()
- Removing Columns
- Grouping your DataFrame
- Pivot your DataFrame
- Joining DataFrames
- Union
- Unpivot your DataFrame
- Pandas
- Utilising the Medallion Architecture in Databricks
- Medallion Architecture
- Resource: Medallion Architecture
- Challenge Section: Customer Orders
- Dataset Download and DBFS Upload
- Assignment 1: Bronze to Silver
- Assignment 1 Solutions Walkthrough
- Assignment 2: Silver to Gold
- Assignment 2 Solutions Walkthrough
- Visualizations and Dashboards
- Visualizations and Dashboards
- Accessing Data from Azure Data Lake Storage (ADLS) with Databricks
- Creating an ADLS Gen2 Account
- (Optional) Storage Explorer
- Accessing via Access Keys
- Accessing via SAS Token
- Mounting ADLS to DBFS Overview
- Mounting ADLS to DBFS Demo
- Secret Scopes
- End to End Walkthrough Example
- Hive Metastore, Databases, Tables and Views
- Running SQL on DataFrames
- Hive Metastore and Creating Databases
- Managed Tables
- Specifying a Location for your Underlying Managed Table Data
- Unmanaged (External) Tables
- Permanent Views
- Challenge Section: Employees
- Dataset Download and ADLS Upload
- Assignment: Employees
- Assignment Solutions Walkthrough