当前位置:网站首页>02-SparkSQL
02-SparkSQL
2022-04-22 04:08:00 【wangyanglongcc】
常用方法
spark.sql执行sql语句
sqlstr = """ select store_code ,store_name ,location from store where country = 'CN' order by id """
spark.sql(sqlstr)
2. show & display查看数据
show
df.show()

display(推荐)
display(df)

3. 从现有表创建一个DataFrame。spark.table & spark.sql
spark.table
df = spark.table('user')
spark.sql
df = spark.sql('select * from user')
4. 查看schema
df.printSchema()

df.schema

%scala
spark.read.parquet("/mnt/training/ecommerce/events/events.parquet").schema.toDDL

5. 查看DataFrame字段名df.schema.names
6. 统计行数count
df.count()
7. where、select、orderBy
sql
select name,price
from product
where price < 200
order by price
where select orderBy
df_product\
.where('price < 200')\
.select('name','price')\
.orderBy(price) # orderBy(price,ascending=False)
8. 注册临时视图createOrReplaceTempView
df.createOrReplaceTempView('product') # 这种方法创建的临时视图只能在当前notebook中调用,不能全局调用
spark.sql('select * from product where price < 200') # 调用上面注册的product
9. 注册全局视图createOrReplaceGlobalTempView
df.createOrReplaceGlobalTempView('product') # 这种方法创建的临时视图是全局视图,不仅在当前notebook中可以使用,在其他notebook中也可以使用
调用
spark.sql('select * from product where price < 200')
如上这种方式调用是失败的。因为全局视图是存储在global_temp这个库里面的,所以需要加上库名才能调用成功
spark.sql('select * from global_temp.product where price < 200')
版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_33246702/article/details/124327461
边栏推荐
- Go gin framework configuration log output to file
- See how the project manager brings a project to ruin
- 網頁性能優化
- 交通行業提昇數據利用效率的核心是做好數據交換與共享
- Shell programming
- 调用函数时,关于传参那些事~
- 使用 nohup 命令将程序挂载在后台执行
- Exploring Presto SQL Engine (2) - Analysis of join
- Stc8a8k64d4 (51 Series MCU) printf printing data abnormal problem
- tensorflow报错:returned a result with an error set解决方案
猜你喜欢

10-Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding论文详解

sumo 教程——高速公路

Implement joint type verification of parameters in nest

Open source culture is still shining - in the openeuler community, with technology and idea, you are the protagonist

01 knapsack problem (two-dimensional array solution and one bit array optimization)

Length of circular queue "in datastructure"

Autodesk genuine service2020 delete

. net debugging: use visual studio to debug dump files

虚拟dom

LeetCode_矩形_困难_391.完美矩形
随机推荐
SQL statements used occasionally
sumo教程——公共交通教程
How to generate PCB real-time snapshot in 3D in Ad
【近日力扣】(位运算合集)不用加减乘除做加法+只出现一次的数字+只出现一次的数字 II+只出现一次的数字 III
[recent force buckle] sum of two numbers + same tree
Header song answer (string basic operation)
.net调试:使用visual studio调试dump文件
sumo教程——Manhattan
Sub database and sub table
Where is the whole house intelligence that Huawei, Haier Zhijia and Xiaomi are all doing?
How to solve the problem of DataGrid flash back? MySQL
See how the project manager brings a project to ruin
PowerDesiPowerDesigner导入sql 不显示表关联关系 怎么解决
Browser overview local cache cookies, etc
iptables使用
Autodesk Genuine Service2020删除
【网络实验】/主机/路由器/交换机/网关/路由协议/RIP+OSPF/DHCP
PHP excel import time format conversion
[raspberry pie C language development] experiment 12: pcf8591 analog-to-digital converter module
Machine learning series (5)_ Feature Engineering 03 small case of carbon emission