当前位置:网站首页>PysparkNote104---join表关联
PysparkNote104---join表关联
2022-08-06 14:34:00 【维格堂406小队】
Intro
pyspark join用法,看看api是不是就够了。。。。
数据构造
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
def get_or_create(app_name):
spark = (
SparkSession.builder.appName(app_name)
.config("spark.driver.maxResultSize", "10g")
.config("spark.sql.execution.arrow.enabled", "true")
.config("spark.dynamicAllocation.enabled", "false")
.config("spark.sql.crossJoin.enabled", "true")
.config("spark.kryoserializer.buffer.max", "512m")
.getOrCreate()
)
spark.sparkContext.setLogLevel("ERROR")
return spark
import pandas as pd
df1 = pd.DataFrame({
"name": ["A", "B"],"name1": ["A", "B"], "age": [10, 20]})
df2 = pd.DataFrame({
"name": ["A"],"name1": ["A"], "sex": ["male"]})
spark = get_or_create("spark")
df_spark1 = spark.createDataFrame(df1)
df_spark2 = spark.createDataFrame(df2)
df_spark1.show(truncate=False)
+----+-----+---+
|name|name1|age|
+----+-----+---+
|A |A |10 |
|B |B |20 |
+----+-----+---+
df_spark2.show(truncate=False)
+----+-----+----+
|name|name1|sex |
+----+-----+----+
|A |A |male|
+----+-----+----+
join
主要是关联列名相同or不同时的使用差异。help上都有,可自查
:param other: Right side of the join
:param on: a string for the join column name, a list of column names,
a join expression (Column), or a list of Columns.
If `on` is a string or a list of strings indicating the name of the join column(s),
the column(s) must exist on both sides, and this performs an equi-join.
:param how: str, default ``inner``. Must be one of: ``inner``, ``cross``, ``outer``,
``full``, ``full_outer``, ``left``, ``left_outer``, ``right``, ``right_outer``,
``left_semi``, and ``left_anti``.
列名相同时
df_spark1.join(other=df_spark2,on=['name'],how='left').show()
+----+-----+---+-----+----+
|name|name1|age|name2| sex|
+----+-----+---+-----+----+
| B| B| 20| null|null|
| A| A| 10| A|male|
+----+-----+---+-----+----+
多个列名相同步
df_spark1.join(other=df_spark2,on=['name','name1'],how='left').show()
+----+-----+---+----+
|name|name1|age| sex|
+----+-----+---+----+
| A| A| 10|male|
| B| B| 20|null|
+----+-----+---+----+
关联的列名不同
df_spark1.join(other=df_spark2,on=[df_spark1.name==df_spark2.name1],how='left').show()
+----+-----+---+----+-----+----+
|name|name1|age|name|name1| sex|
+----+-----+---+----+-----+----+
| B| B| 20|null| null|null|
| A| A| 10| A| A|male|
+----+-----+---+----+-----+----+
多关联条件
df_spark1.join(other=df_spark2,on=[df_spark1.name==df_spark2.name1,df_spark1.name1==df_spark2.name],how='left').show()
+----+-----+---+----+-----+----+
|name|name1|age|name|name1| sex|
+----+-----+---+----+-----+----+
| A| A| 10| A| A|male|
| B| B| 20|null| null|null|
+----+-----+---+----+-----+----+
df_spark1.join(other=df_spark2,on=[df_spark1.name==df_spark2.name1,df_spark1.name1==df_spark2.name],how='outer').show()
+----+-----+---+----+-----+----+
|name|name1|age|name|name1| sex|
+----+-----+---+----+-----+----+
| A| A| 10| A| A|male|
| B| B| 20|null| null|null|
+----+-----+---+----+-----+----+
简单的用法介绍完毕
2022-08-04 于南京市江宁区九龙湖
边栏推荐
- 00后写个暑假作业,被监控成这笔样
- 豪威宣布发布世界首款产品级 CIS / EVS 融合视觉芯片
- leetcode经典例题——滑动窗口最大值
- shell实现加密压缩文件自动解压
- go Benchmark 写法注意事项
- New kernel PHP enterprise website development and construction management system
- [Installation filling pit] -import win32api, sys, os ImportError: DLL load failed: The specified module could not be found.
- 深入浅出边缘云 | 5. 运行时控制
- js 数组移除指定元素【函数封装】(含对象数组移除指定元素)
- mv-lcd初始化
猜你喜欢

Rocket MQ Crash-Safe机制浅析

如何从一个空有上进心的人,变成行动上的巨人?

论文解读:《iRice-MS:用于检测水稻多型翻译后修饰位点的集成 XGBoost 模型》

突发!倪行军出任支付宝中国董事长,技术出身的他,曾写下“支付宝”第一行代码.........

Talking about Tree Arrays

自然语言处理的前世、今生和未来

cube开源一站式云原生机器学习平台--volcano 多机分布式计算

腾讯云胡启明:Kubernetes云上资源的分析与优化

运筹说 第71期|论文速读之时间背包问题

腾讯欲成育碧最大股东/ 米哈游招NLP内容生成研究员/ AI发现四千余物种濒临灭绝...今日更多新鲜事在此...
随机推荐
From technical panorama to scene combat, analyze the evolutionary breakthrough of "narrowband HD"
Zhaoqi Science and Technology Innovation and Entrepreneurship Service Platform, introduction of high-level talents for innovation and entrepreneurship, investment and financing docking
Redis安装
MySQL存储引擎
为什么国债逆回购很安全?
用于毫米波雷达的GNN:Radar-PointGNN
Go 大杀器之跟踪剖析 trace
我终于逃离了互联网,却陷入了迷茫
MODBUS转PROFINET网关将电力智能监控仪表接入PROFINET网络案例
Come and watch | How do the big guys deal with the risk control feature variable pool
redis data types and common commands
【leetcode周赛总结】
LeetCode_递归_中等_397.整数替换
Tencent Cloud Hu Qiming: Analysis and Optimization of Kubernetes Cloud Resources
[CSAWQual 2019]Web_Unagi
为什么要做LiveVideoStack课程?
The basic process used by mosquitto and some problems encountered
继续网址笔记
The basic process used by mosquitto and some problems encountered
PX4模块设计之十七:ModuleBase模块