Wednesday 31 July 2019

Apache Spark - Join and Aggregation

Q: What is the DataFrame equivalent of the SQL statement SELECT count(*) AS total
A: .agg(count("*").alias("total"))
Q: What is the DataFrame equivalent of the SQL statementSELECT firstName FROM PeopleDistinctNames INNER JOIN SSADistinctNames ON firstName = ssaFirstName
A: peopleDistinctNamesDF.join(ssaDistinctNamesDF, peopleDistinctNamesDF(col("firstName")) == col("ssaFirstName"))

No comments:

Post a Comment