当前位置:网站首页>【Pyspark】udf使用入门
【Pyspark】udf使用入门
2022-08-09 03:35:00 【山顶夕景】
方法一:使用到select
以下面的将Names
列的名字中的每个单词首字母改为大写字母为栗子:
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
columns = ["Seqno","Name"]
data = [("1", "john jones"),
("2", "tracey smith"),
("3", "amy sanders")]
df = spark.createDataFrame(data=data,schema=columns)
df.show(truncate=False)
+-----+------------+
|Seqno|Names |
+-----+------------+
|1 |john jones |
|2 |tracey smith|
|3 |amy sanders |
+-----+------------+
def convertCase(str):
resStr=""
arr = str.split(" ")
for x in arr:
resStr= resStr + x[0:1].upper() + x[1:len(x)] + " "
return resStr
""" 将函数转为udf """
convertUDF = udf(lambda z: convertCase(z),StringType())
""" 默认返回值是 StringType(),所以上面不执行也行 """
convertUDF = udf(lambda z: convertCase(z))
df.select(col("Seqno"), \
convertUDF(col("Name")).alias("Name") ) \
.show(truncate=False)
+-----+-------------+
|Seqno|Name |
+-----+-------------+
|1 |John Jones |
|2 |Tracey Smith |
|3 |Amy Sanders |
+-----+-------------+
方法二:使用withColumn
def upperCase(str):
return str.upper()
upperCaseUDF = udf(lambda z:upperCase(z),StringType())
df.withColumn("Cureated Name", upperCaseUDF(col("Name"))) \
.show(truncate=False)
+-----+------------+-------------+
|Seqno|Name |Cureated Name|
+-----+------------+-------------+
|1 |john jones |JOHN JONES |
|2 |tracey smith|TRACEY SMITH |
|3 |amy sanders |AMY SANDERS |
+-----+------------+-------------+
Reference
[1] https://sparkbyexamples.com/pyspark/pyspark-udf-user-defined-function/
边栏推荐
- "The Sword Offer" Problem Solution - week1 (continuously updated)
- 以赛促练-力扣第84场双周赛反思以及第305场周赛补题
- Win7电脑无法进入睡眠模式?
- The condition variable condition_variable implements thread synchronization
- 新工作切入反思
- VsCode如何使用国内镜像下载
- C18-PEG- ALD批发_C18-PEG-CHO_C18-PEG-醛基
- 【 21 based texture (2, bump mapping theory) 】
- 笔记本重装系统如何找回之前自己自带的office
- wift3.0 set the navigation bar, title, font, item color and font size
猜你喜欢
随机推荐
23 Lectures on Disassembly of Multi-merchant Mall System Functions-Platform Distribution Level
5824. 子字符串突变后可能得到的最大整数
07.1 类的的补充
powershell execution strategy
SQL注入(3)
了解CV和RoboMaster视觉组(五)滤波器、观测器和预测方法:粒子滤波器Particle Filter
一本通1258——数字金字塔(动态规划)
5.索引优化实战
Kaggle(六)特征衍生技术 特征聚合
医学影像分割系统综述Data preparation for artificial intelligence in medical imaging: A comprehensive guide ...
wift3.0设置导航栏,标题,字体,item颜色和字体大小
redis的四种模式
Linux安装MySQL8
Second data CEO CAI data warming invited to jointly organize the acceleration data elements online salon
Day021 Book management system (objects and arrays)
Embedded system driver advanced [3] - __ID matching and device tree matching under platform bus driver development
新型双功能螯合剂NOTA及其衍生物CAS号:147597-66-8p-SCN-Bn-NOTA
了解CV和RoboMaster视觉组(五)CNN没有不变性?
365 days challenge LeetCode1000 topic - Day 051 special binary sequence partition
32 基本统计知识——假设检验