当前位置:网站首页>08-UDFs
08-UDFs
2022-04-22 19:16:00 【wangyanglongcc】
User-Defined Functions
-
Define a function
-
Create and apply UDF
-
Register UDF to use in SQL
-
Use Decorator Syntax (Python Only)
-
Use Vectorized UDF (Python Only)
Methods
-
UDF Registration (
spark.udf):register -
Built-In Functions :
udf -
Python UDF Decorator :
@udf -
Pandas UDF Decorator :
@pandas_udf
Define a function
Define a function in local Python/Scala to get the first letter of a string from the email field.
def firstLetterFunction(email):
return email[0]
The function is in spark.DataFrame It can't be used in .
from pyspark.sql.functions import col
display(salesDF.select(firstLetterFunction(col("email"))))

adopt udf Function defines the function as udf After the function, you can use
from pyspark.sql.functions import udf
firstLetterUDF = udf(firstLetterFunction)
display(salesDF.select(firstLetterUDF(col("email"))))

Register UDF to use in SQL
Register UDF using spark.udf.register to create UDF in the SQL namespace.
salesDF.createOrReplaceTempView("sales")
spark.udf.register("sql_udf", firstLetterFunction)
SELECT email,sql_udf(email) AS firstLetter FROM sales

Use Decorator Syntax (Python Only)
Alternatively, define UDF using decorator syntax in Python with the datatype the function returns.
# Our input/output is a string
@udf("string")
def decoratorUDF(email: str) -> str:
return email[0]
from pyspark.sql.functions import col
salesDF = spark.read.parquet("/mnt/dbswarehouse/raw/sales.parquet")
display(salesDF.select(decoratorUDF(col("email"))))

Use Vectorized UDF (Python Only)
import pandas as pd
from pyspark.sql.functions import pandas_udf
# We have a string input/output
@pandas_udf("string")
def vectorizedUDF(email: pd.Series) -> pd.Series:
return email.str[0]
# Alternatively
vectorizedUDF = pandas_udf(lambda s: s.str[0], "string")
display(salesDF.select(vectorizedUDF(col("email"))))

We can also register these Vectorized UDFs to the SQL namespace.
spark.udf.register("sql_vectorized_udf", vectorizedUDF)
版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221909134137.html
边栏推荐
- Pychar configures CONDA and uses the correct image source address in China
- 微服务负载均衡器Ribbon介绍
- webrtc+turn+peerconnection_ Server measurement delay
- 06. 重构-简化条件表达式
- ArrayList学习笔记
- 【面试普通人VS高手系列】请说一下网络四元组
- Can fire doors apply for BS 476-21 fire resistance test?
- @Case mapping of object attributes modified by requestbody
- MySQL排错信息查询(持续更新)
- Type of Flink window
猜你喜欢

Redis缓存之String的滥用

Linux环境下部署redis教程详解

Can deleted photos be restored? 3 tips to restore deleted photos

12-Delta Lake

短链接设计和思考

Misuse of redis cache string
![[TCP] TCP three handshakes and four waves](/img/d1/20252b9d83730ca6c6cfa06673eacb.png)
[TCP] TCP three handshakes and four waves

Originally, this is the correct posture for developers to open world book day

Small LED screen / digital alarm clock display screen / LED billboard / temperature digital display and other LED nixie tube display driver ic-vk1640 / 1640B sop28 / ssop24 package

Markdown 学习和实践
随机推荐
项目实训- 基于unity的2D多人乱斗闯关游戏设计与开发(三、Unity PlasticSCM多人协同)
DBFS CLI : 01-Setting up the CLI
Redis的key和value最佳实践
Error c4996 'fopen': this function or variable may be unsafe Consider using fopen_ s instead. To disabl
ReDet 代码逐行解读
【Appium踩坑】Failed to capture a screenshot. Does the current view have ‘secure‘ flag set?
How to write a colorful blog Append (cute)
Picture to Base64
2020-01-14 DAPP development environment construction
Understanding of string constant pool and intern method
SQL command distinct
配置拦截器不拦截Swagger
数据库索引
多开关、多业务线设计思考和总结
什么是 SAML 断言?
VS 2022 安装vld内存泄漏检测工具
mysql 学习笔记
值得推荐的chrome插件
There are so many operation and maintenance tools, which one to choose? Follow me for three seconds
系统分析师-论文写作 框架搭建