Interacting with DataFrames
Once created (instantiated), a DataFrame object has methods attached to it. Methods are operations one can perform on DataFrames such as filtering, counting, aggregating and many others.
Example: To create (instantiate) a DataFrame, use this syntax:df = ...
To display the contents of the DataFrame, apply a
show
operation (method) on it using the syntax df.show()
.
The
.
indicates you are applying a method on the object.
In working with DataFrames, it is common to chain operations together, such as:
df.select().filter().orderBy()
.
By chaining operations together, you don't need to save intermediate DataFrames into local variables (thereby avoiding the creation of extra objects).
Also note that you do not have to worry about how to order operations because the optimizier determines the optimal order of execution of the operations for you.
df.select(...).orderBy(...).filter(...)
versus
df.filter(...).select(...).orderBy(...)
No comments:
Post a Comment