当前位置:网站首页>【Pyspark】udf使用入门
【Pyspark】udf使用入门
2022-08-09 03:35:00 【山顶夕景】
方法一:使用到select
以下面的将Names列的名字中的每个单词首字母改为大写字母为栗子:
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
columns = ["Seqno","Name"]
data = [("1", "john jones"),
("2", "tracey smith"),
("3", "amy sanders")]
df = spark.createDataFrame(data=data,schema=columns)
df.show(truncate=False)
+-----+------------+
|Seqno|Names |
+-----+------------+
|1 |john jones |
|2 |tracey smith|
|3 |amy sanders |
+-----+------------+
def convertCase(str):
resStr=""
arr = str.split(" ")
for x in arr:
resStr= resStr + x[0:1].upper() + x[1:len(x)] + " "
return resStr
""" 将函数转为udf """
convertUDF = udf(lambda z: convertCase(z),StringType())
""" 默认返回值是 StringType(),所以上面不执行也行 """
convertUDF = udf(lambda z: convertCase(z))
df.select(col("Seqno"), \
convertUDF(col("Name")).alias("Name") ) \
.show(truncate=False)
+-----+-------------+
|Seqno|Name |
+-----+-------------+
|1 |John Jones |
|2 |Tracey Smith |
|3 |Amy Sanders |
+-----+-------------+
方法二:使用withColumn
def upperCase(str):
return str.upper()
upperCaseUDF = udf(lambda z:upperCase(z),StringType())
df.withColumn("Cureated Name", upperCaseUDF(col("Name"))) \
.show(truncate=False)
+-----+------------+-------------+
|Seqno|Name |Cureated Name|
+-----+------------+-------------+
|1 |john jones |JOHN JONES |
|2 |tracey smith|TRACEY SMITH |
|3 |amy sanders |AMY SANDERS |
+-----+------------+-------------+
Reference
[1] https://sparkbyexamples.com/pyspark/pyspark-udf-user-defined-function/
边栏推荐
- 笔记本重装系统如何找回之前自己自带的office
- 一本通1258——数字金字塔(动态规划)
- phpStdudy的下载和DVWA的搭建
- 07 类与对象(一)
- 【平衡二叉搜索树】细撕AVL树的插入操作
- wift3.0 set the navigation bar, title, font, item color and font size
- 365 days challenge LeetCode1000 topic - Day 051 special binary sequence partition
- 了解CV和RoboMaster视觉组(五)目标跟踪:基于深度学习的方法
- 了解CV和RoboMaster视觉组(五)滤波器、观测器和预测方法:粒子滤波器Particle Filter
- 状态机使用小结
猜你喜欢

MutationObserver接口(一) 基本用法

2022-08-08 The fifth group Gu Xiangquan study notes day31-collection-IO stream-File class

浅聊一下那些营销工具—优惠券

医学影像分割系统综述Data preparation for artificial intelligence in medical imaging: A comprehensive guide ...

了解CV和RoboMaster视觉组(五)目标跟踪:基于深度学习的方法

宝塔实测-在线药店商城源码带WAP版

【CAS:41994-02-9 |Biotinyl Tyramide|】生物素基酪氨酰胺

leetcode-23. Merge K ascending linked lists

NanoDet代码逐行精读与修改(五.1)检测头的构造和前向传播

Win10开始菜单打不开怎么办?
随机推荐
SQL JOIN上的and
A separate machine is connected to the spark cluster of cdh, and the task is submitted remotely (absolutely successful, I have tested it n times)
JSON beautification plugin for Chrome
SQL注入(4)
driftingblues靶机wp
Matlab optimization method -- 0.618 method
PhotoShop软件笔记
Talk about those marketing tools - coupons
powershell 执行策略
Kaggle(六)特征衍生技术 特征聚合
以赛促练-力扣第84场双周赛反思以及第305场周赛补题
了解CV和RoboMaster视觉组(五)滤波器、观测器和预测方法
【Redis底层解析】字典类型
Embedded system driver advanced [3] - __ID matching and device tree matching under platform bus driver development
31 basic statistical concepts
Redis expiration strategy and elimination strategy
SIP协议栈学习之开始篇
了解CV和RoboMaster视觉组(五)CNN没有不变性?
leetcode 5722. 截断句子
33 基本统计知识——单项非参数检验