Monday 29 July 2019

dataframe : how to groupBy/alias count then filter on count



df.groupBy("x").agg(count("*").alias("cnt"))



top10FemaleFirstNamesDF = (peopleDF.select("firstName").filter("gender=='F'").groupBy("firstName").agg(count("*").alias("cnt")).sort(desc("cnt")).limit(10));

No comments:

Post a Comment