Monday 29 July 2019

Interacting with DataFrames

Interacting with DataFrames

Once created (instantiated), a DataFrame object has methods attached to it. Methods are operations one can perform on DataFrames such as filtering, counting, aggregating and many others.
Example: To create (instantiate) a DataFrame, use this syntax: df = ...
To display the contents of the DataFrame, apply a show operation (method) on it using the syntax df.show().
The . indicates you are applying a method on the object.
In working with DataFrames, it is common to chain operations together, such as: df.select().filter().orderBy().
By chaining operations together, you don't need to save intermediate DataFrames into local variables (thereby avoiding the creation of extra objects).
Also note that you do not have to worry about how to order operations because the optimizier determines the optimal order of execution of the operations for you.
df.select(...).orderBy(...).filter(...)
versus
df.filter(...).select(...).orderBy(...)

No comments:

Post a Comment