Wednesday 31 July 2019

multiple conditions for filter in spark data frames


# TODO carensDF = peopleDF.filter(col("firstName") == "Caren" & col("gender")== "F").limit(10) display(carensDF)

solution

carensDF = peopleDF.filter((col("firstName") == "Caren") & (col("gender") == "F")).limit(10)
separate the 2 conditions in filter with brackets

or

carensDF = peopleDF.filter(col("firstName") == "Caren" ).filter(col("gender") == "F").limit(10)


Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <command-3332536293827331> in <module>() 1 # TODO ----> 2 carensDF = peopleDF.filter(col("firstName") == "Caren" and col("gender")== "F").limit(10) 3 display(carensDF) /databricks/spark/python/pyspark/sql/column.py in __nonzero__(self) 680 681 def __nonzero__(self): --> 682 raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', " 683 "'~' for 'not' when building DataFrame boolean expressions.") 684 __bool__ = __nonzero__ ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.




No comments:

Post a Comment