Saturday 27 July 2019

Azure Databricks - Create Cluster and NoteBooks

Apache Spark notebooks

After creating your Databricks workspace, it's time to create your first notebook and Spark cluster.

What is Apache Spark notebook?

A notebook is a collection of cells. These cells are run to execute code, to render formatted text, or to display graphical visualizations.

What is a cluster?

The notebooks are backed by clusters, or networked computers, that work together to process your data. The first step is to create a cluster.

Create a cluster

  1. In the Azure portal, click All resources menu on the left side navigation and select the Databricks workspace you created in the last unit.
  2. Select Launch Workspace to open your Databricks workspace in a new tab.
  3. In the left-hand menu of your Databricks workspace, select Clusters.
  4. Select Create Cluster to add a new cluster.
    The create cluster page
  5. Enter a name for your cluster. Use your name or initials to easily differentiate your cluster from your coworkers.
  6. Select the Databricks RuntimeVersion. We recommend the latest runtime (4.0 or newer) and Scala 2.11.
  7. Specify your cluster configuration.
    • For clusters that are created on a Community Edition, the default values are sufficient for the remaining fields.
    • For all other environments, refer to your company's policy on creating and using clusters.
  8. Select Create Cluster.
 Note
Check with your local system administrator to see if there is a recommended default cluster at your company to use for the rest of the class. Making new cluster will incur costs.

Create a notebook

  1. On the left-hand menu of your Databricks workspace, select Home.
  2. Right-click on your home folder.
  3. Select Create.
  4. Select Notebook.
    The menu option to create a new notebook
  5. Name your notebook First Notebook.
  6. Set the Language to Python.
  7. Select the cluster to which to attach this notebook.
     Note
    This option displays only when a cluster is currently running. You can still create your notebook and attach it to a cluster later.
  8. Select Create.
Now that you've created your notebook, let's use it to run some code.

Attach and detach your notebook

To use your notebook to run a code, you must attach it to a cluster. You can also detach your notebook from a cluster and attach it to another depending upon your organization's requirements.
The options that are available when a notebook is attached to a cluster
If your notebook is attached to a cluster, you can:
  • Detach your notebook from the cluster
  • Restart the cluster
  • Attach to another cluster
  • Open the Spark UI
  • View the log files of the driver

No comments:

Post a Comment