Apache Spark notebooks
After creating your Databricks workspace, it's time to create your first notebook and Spark cluster.
What is Apache Spark notebook?
A notebook is a collection of cells. These cells are run to execute code, to render formatted text, or to display graphical visualizations.
What is a cluster?
The notebooks are backed by clusters, or networked computers, that work together to process your data. The first step is to create a cluster.
Create a cluster
- In the Azure portal, click All resources menu on the left side navigation and select the Databricks workspace you created in the last unit.
- Select Launch Workspace to open your Databricks workspace in a new tab.
- In the left-hand menu of your Databricks workspace, select Clusters.
- Select Create Cluster to add a new cluster.
- Enter a name for your cluster. Use your name or initials to easily differentiate your cluster from your coworkers.
- Select the Databricks RuntimeVersion. We recommend the latest runtime (4.0 or newer) and Scala 2.11.
- Specify your cluster configuration.
- For clusters that are created on a Community Edition, the default values are sufficient for the remaining fields.
- For all other environments, refer to your company's policy on creating and using clusters.
- Select Create Cluster.
Note
Check with your local system administrator to see if there is a recommended default cluster at your company to use for the rest of the class. Making new cluster will incur costs.
Create a notebook
- On the left-hand menu of your Databricks workspace, select Home.
- Right-click on your home folder.
- Select Create.
- Select Notebook.
- Name your notebook First Notebook.
- Set the Language to Python.
- Select the cluster to which to attach this notebook.NoteThis option displays only when a cluster is currently running. You can still create your notebook and attach it to a cluster later.
- Select Create.
Now that you've created your notebook, let's use it to run some code.
Attach and detach your notebook
To use your notebook to run a code, you must attach it to a cluster. You can also detach your notebook from a cluster and attach it to another depending upon your organization's requirements.
If your notebook is attached to a cluster, you can:
- Detach your notebook from the cluster
- Restart the cluster
- Attach to another cluster
- Open the Spark UI
- View the log files of the driver
No comments:
Post a Comment