Azure: Databricks Notebook Fundamentals

A notebook is a collection cells. These cells are run to execute code, render formatted text or display graphical visualizations.

Understanding Code Cells and Markdown Cells

Cmd 5

# This is a code cell
# By default, a new cell added in a notebook is a code cell
1 + 1

Out[3]: 2

Command took 0.04 seconds -- by girdhar.ramit@gmail.com at 7/28/2019, 2:35:30 PM on Test Cluster

Cmd 6

In this case, the code is written in Python. The default language for a cell is provided by the notebook, as can be seen by observing the title of the notebook, near the top left of the window. In this case it reads 01 Notebook Fundamentals (Python). The notebook language is always shown in parenthesis following the notebook title.

In order to run a notebook cell, your notebook must be attached to a cluster. If your notebook is not attached to a cluster, you will be prompted to do so before the cell can run.

To attach a notebook to a cluster:

In the notebook toolbar, click Detached .
From the drop-down, select a cluster.

The code cell above has not run yet, so the expressions of 1 + 1 has not been evaluated. To run the code cell, select the cell by placing your cursor within the cell text area and do any of the following:

Press Shift + Enter (to run the current cell and advance to the next cell)
Press Ctrl + Enter (to run the current cell, but keep the current cell selected)
Use the cell actions menu that is found at the far right of the cell to select the run cell option:

Cmd 7

Cmd 8

# This is also a code cell
print("Welcome to the fabulous world of Databricks!")

Welcome to the fabulous world of Databricks!

Command took 0.03 seconds -- by girdhar.ramit@gmail.com at 7/28/2019, 2:35:49 PM on Test Cluster

Cmd 9

Cmd 10

This is a markdown cell.

To create a markdown cell you need to use a "magic" command which is the short name of a Databricks magic command.

Magic commands start with %.

The magic for markdown is %md.

The magic commmand must always be the first text within the cell.

The following provides the list of supported magics:

%python - Allows you to execute Python code in the cell.
%r - Allows you to execute R code in the cell.
%scala - Allows you to execute Scala code in the cell.
%sql - Allows you to execute SQL statements in the cell.
sh - Allows you to execute Bash Shell commmands and code in the cell.
fs - Allows you to execute Databricks Filesystem commands in the cell.
md - Allows you to render Markdown syntax as formatted content in the cell.
run - Allows you to run another notebook from a cell in the current notebook.

To read more about magics see here.

displayHTML("<iframe src='https://bing.com' width='100%' height='350px'/>")

"Hello Databricks world!"

Cmd 20

Cmd 21

"Hello Databricks world!"
"And, hello Microsoft Ignite!"

Cmd 22

Cmd 23

print("Hello Databricks world!")
print("And, hello Microsoft Ignite!")

Cmd 24

Cmd 25

text_variable="Hello, hello!"

You can move cells up or down within the notebook to fit your needs.

You can do so using the UI or via keyboard shortcuts.

When in command mode, you can cut and paste entire cells using keyboard shortcuts. To do so, select the cell and then press X to cut it. Use the up or down arrow keys to find the cell around which it should be pasted. Press V to paste the cell below the selected cell or press SHIFT + V to paste the cell above the selected cell.

You can also move cells using the UI, by accessing the notebook cell menu at the far right, selecting the down caret and then selecting either Move Up or Move Down or the Cut Cell and Paste Cell options.

x=10

Cmd 47

y=x+1
y

Cmd 48

Cmd 49

x=100

Cmd 50

Now select the cell that has the lines y = x + 1 and y. And re-run that cell. Did the value of y meet your expectation?

The value of y should now be 101. This is because it is not the actual order of the cells that determines the value, but the order in which they are run and how that affects the underlying state itself. To understand this, realize that when the code x = 100 was run, this changed the value of x, and then when you re-ran the cell containing y = x + 1 this evaluation used the current value of x which is 100. This resulted in y having a value of 101 and not 11.

Cmd 51

Clearing state and output

You can use the Clear dropdown on the notebook toolbar remove output (results) or remove output and clear the underlying state.

Clear -> Clear Results (removes the displayed output for all cells)
Clear -> Clear State (removes all cell states)

You typically do this when you want to cleanly re-run a notebook you have been working on and eliminate any accidental changes to the state that may have occured while you were authoring the notebook.

Azure

Saturday, 27 July 2019

Databricks Notebook Fundamentals

Understanding Code Cells and Markdown Cells

Supported Markdown content

Understanding cell output

Running multiple notebook cells

Navigating Cells

Managing notebook Cells

Keyboard shortcuts for adding and removing cells

Adding and removing cells using the UI

Adjusting cell order

Understanding notebook state

Clearing state and output

Introducing Spark DataFrames

Introducting Pandas DataFrames

No comments:

Post a Comment