pyspark.sql.DataFrame.union¶
-
DataFrame.
union
(other: pyspark.sql.dataframe.DataFrame) → pyspark.sql.dataframe.DataFrame[source]¶ Return a new
DataFrame
containing union of rows in this and anotherDataFrame
.New in version 2.0.0.
Changed in version 3.4.0: Supports Spark Connect.
See also
Notes
This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by
distinct()
.Also as standard in SQL, this function resolves columns by position (not by name).
Examples
>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"]) >>> df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"]) >>> df1.union(df2).show() +----+----+----+ |col0|col1|col2| +----+----+----+ | 1| 2| 3| | 4| 5| 6| +----+----+----+ >>> df1.union(df1).show() +----+----+----+ |col0|col1|col2| +----+----+----+ | 1| 2| 3| | 1| 2| 3| +----+----+----+