当前位置:网站首页>【Pyspark】udf使用入门
【Pyspark】udf使用入门
2022-08-09 03:35:00 【山顶夕景】
方法一:使用到select
以下面的将Names
列的名字中的每个单词首字母改为大写字母为栗子:
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
columns = ["Seqno","Name"]
data = [("1", "john jones"),
("2", "tracey smith"),
("3", "amy sanders")]
df = spark.createDataFrame(data=data,schema=columns)
df.show(truncate=False)
+-----+------------+
|Seqno|Names |
+-----+------------+
|1 |john jones |
|2 |tracey smith|
|3 |amy sanders |
+-----+------------+
def convertCase(str):
resStr=""
arr = str.split(" ")
for x in arr:
resStr= resStr + x[0:1].upper() + x[1:len(x)] + " "
return resStr
""" 将函数转为udf """
convertUDF = udf(lambda z: convertCase(z),StringType())
""" 默认返回值是 StringType(),所以上面不执行也行 """
convertUDF = udf(lambda z: convertCase(z))
df.select(col("Seqno"), \
convertUDF(col("Name")).alias("Name") ) \
.show(truncate=False)
+-----+-------------+
|Seqno|Name |
+-----+-------------+
|1 |John Jones |
|2 |Tracey Smith |
|3 |Amy Sanders |
+-----+-------------+
方法二:使用withColumn
def upperCase(str):
return str.upper()
upperCaseUDF = udf(lambda z:upperCase(z),StringType())
df.withColumn("Cureated Name", upperCaseUDF(col("Name"))) \
.show(truncate=False)
+-----+------------+-------------+
|Seqno|Name |Cureated Name|
+-----+------------+-------------+
|1 |john jones |JOHN JONES |
|2 |tracey smith|TRACEY SMITH |
|3 |amy sanders |AMY SANDERS |
+-----+------------+-------------+
Reference
[1] https://sparkbyexamples.com/pyspark/pyspark-udf-user-defined-function/
边栏推荐
- 06 Dynamic memory
- wift3.0 set the navigation bar, title, font, item color and font size
- Embedded system driver advanced [3] - __ID matching and device tree matching under platform bus driver development
- win10怎么安装.net framework 3.5?
- Linux安装MySQL8
- 嵌入式系统驱动高级【2】——平台总线式驱动开发上_基础框架
- 07 类与对象(一)
- 2022-08-08 The fifth group Gu Xiangquan study notes day31-collection-IO stream-File class
- 如何应对网络攻击?
- NanoDet代码逐行精读与修改(三)辅助训练模块AGM
猜你喜欢
随机推荐
链接脚本-变量使用中遇到一个问题
What are the functions and applications of the smart counter control board?
powershell execution strategy
技术分享 | 如何模拟真实使用场景?mock 技术来帮你
发明时代,「幂集创新」事关你我
Matlab optimization method -- 0.618 method
关于微软2022/2023秋招内推的几句
Leetcode Brushing Questions - 148. Sort Linked List
Second data CEO CAI data warming invited to jointly organize the acceleration data elements online salon
5. Index optimization practice
2021-07-21
Leetcode刷题——148. 排序链表
了解CV和RoboMaster视觉组(五)目标跟踪:基于深度学习的方法
了解CV和RoboMaster视觉组(五)滤波器、观测器和预测方法
5.索引优化实战
Image.new() 及 img.paste() 的用法记录
理性预测,未来音视频开发前景将是这般光景
动态规划之换硬币
If A, B, C, and D process parts, the total number of processed parts is 370. If the number of parts processed by A is 10 more, if the number of parts processed by B is 20 less, if the number of parts
JSON beautification plugin for Chrome