当前位置:网站首页>04-Functions
04-Functions
2022-04-22 04:08:00 【wangyanglongcc】
join
两个DataFrame根据某个条件进行关联。类似的还有crossJoin返回一个笛卡尔积表
Parameters
otherDataFrame
Right side of the join
onstr, list or Column, optional
a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.
howstr, optional
default
inner. Must be one of:inner,cross,outer,full,fullouter,full_outer,left,leftouter,left_outer,right,rightouter,right_outer,semi,leftsemi,left_semi,anti,leftantiandleft_anti.
cond = [df['name'] == df1['name'], df['age'] == df1['age']]
df.join(df1, cond, 'outer').select(df.name, df3.age)
lit
Creates a column of literal value.
file = '/mnt/dbwarehouse/raw/user.csv'
df = df.withColumn('ingest_file',lit(file))\
.withColumn('converted',lit(True))
display(df)

na.fill & fillna
na.fill:整表填充,将整个表里的空值 且 字段类型能够对应得上得地方 填充。
fillna:用法同na.fill。
df.na.fill(50).show() # 可以看到name列为文本,则不进行填充
+---+------+-----+
|age|height| name|
+---+------+-----+
| 10| 80|Alice|
| 5| 50| Bob|
| 50| 50| Tom|
| 50| 50| null|
+---+------+-----+
df.na.fill(False).show() # 只有spy列为bool,故只填充这一列
+----+-------+-----+
| age| name| spy|
+----+-------+-----+
| 10| Alice|false|
| 5| Bob|false|
|null|Mallory| true|
+----+-------+-----+
df.na.fill({
'age': 50, 'name': 'unknown'}).show() # 按字段名指定填充得值
+---+------+-------+
|age|height| name|
+---+------+-------+
| 10| 80| Alice|
| 5| null| Bob|
| 50| null| Tom|
| 50| null|unknown|
+---+------+-------+
版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_33246702/article/details/124327639
边栏推荐
- .net调试:使用visual studio调试dump文件
- Installation team and installation free version
- 调用函数时,关于传参那些事~
- Sumo course - public transport course
- Alibaba cloud EMAS product dynamics in March
- Machine learning series (5)_ Feature Engineering 03 small case of carbon emission
- Web 安全之 XSS 攻击原理/分类/防御 详解
- Mongodb - $match operation of aggregation pipeline
- Homogeneous nucleation of ice by lammps
- . net debugging: use visual studio to debug dump files
猜你喜欢

Nacos 为什么这么强

The core of improving data utilization efficiency in the transportation industry is to do a good job in data exchange and sharing

L'échange et le partage des données sont au cœur de l'amélioration de l'efficacité de l'utilisation des données dans l'industrie des transports.
![[ext JS] 7.25.1 form or panel automatically locates to the wrong input box](/img/74/564b6fa9ada00aa192fb219b6bc37f.png)
[ext JS] 7.25.1 form or panel automatically locates to the wrong input box

Go gin framework configuration log output to file

Registration process of New Zealand company and materials required

Class component details

SaaS version goes online, and the applet application ecology goes further

便利店卷疯了:便利蜂、罗森、易捷“激战”

Length of circular queue "in datastructure"
随机推荐
DO447Ansible Tower导航
The United States raised interest rates and devalued the RMB, but such products ushered in a honeymoon period
sumo-绕圈行驶
Leetcode1615. Maximum network rank (medium)
机器学习系列(5)_特征工程03碳排放小案例
高斯分布——在误差测量中的推导
【Taro开发】-全局自定义导航栏适配消息通知框位置及其他问题(十四)
交通行業提昇數據利用效率的核心是做好數據交換與共享
Spark quick start series (5) | spark environment construction - standalone (2) configuring history log server
调用函数时,关于传参那些事~
小程序 关于分包
Insert a number into the ordered array (Bubble + rand function)
[raspberry pie C language development] experiment 12: pcf8591 analog-to-digital converter module
偶然间用到的sql语句
Anaconda 相关
Redis persistence
网页性能优化
记一次云服务器配置mysql 远程连接失败的解决方案
头歌答案(字符串基本操作)
Mongodb - $project operation of aggregation pipeline