当前位置:网站首页>14-Sprak设置自动分区
14-Sprak设置自动分区
2022-04-22 19:09:00 【wangyanglongcc】
说明
首先调整配置信息
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
在写入分区表的时候,一定要注意字段顺序,需要把分区字段放到最后,且如果有多个字段分区的话,顺序也要对应。
def re_arrange_partition_columns(df,partition_columns):
''' df : 输入的df,spark.DataFrame类型 partition_columns : list,分区字段,有顺序要求 '''
column_list = []
for column in df.schema.name:
if column not in partition_columns:
column_list.append(column)
column_list.extend(partition_columns)
df = df.select(column_list)
return df
示例
接下来演示动态分区数据的写入
建表
drop table if exists orders;
create table if not exists orders(
id int,
sku string,
qty int
)
using parquet
partitioned by (year int,month int)
生成模拟数据
import pandas as pd
df = pd.DataFrame({
'id':[1,2,3,4],
'sku':['a','b','c','d'],
'qty':[1,3,5,7],
'year':[2020,2021,2022,2021],
'month':[3,5,7,9]
})
spark.createDataFrame(df).createOrReplaceTempView('df_order')
df

设置动态分区&写入数据
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
sqlstr = ''' insert overwrite orders select * from df_order '''
spark.sql(sqlstr)

版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://blog.csdn.net/qq_33246702/article/details/124341589
边栏推荐
- 微服务调用组件Feign介绍
- Can amaze your login page
- 2020-01-14 DAPP development environment construction
- MySQL query with serial number
- Introduction to ribbon of microservice load balancer
- 借《Mastering ABP Framework》好好学学这个框架
- 【2022初春】【LeetCode】695. 岛屿的最大面积
- SQL命令 DISTINCT
- postgre创建序列并绑定到表字段
- Huawei cloud media Zha Yong: Huawei cloud's technical practice in the field of video AI transcoding
猜你喜欢

Alibaba微服务组件Sentinel介绍

Project training - Design and development of 2D multiplayer fighting game based on unity (v. use audiomixer to control the volume)

微服务调用组件Feign介绍

网络安全——Burp Suite抓包工具的使用

Small LED screen / digital alarm clock display screen / LED billboard / temperature digital display and other LED nixie tube display driver ic-vk1640 / 1640B sop28 / ssop24 package

The beautiful and comfortable kn95 mask has strong protection ability

Pycharm 配置 Conda,国内使用正确的镜像源地址

Error c4996 'fopen': this function or variable may be unsafe Consider using fopen_ s instead. To disabl

Can amaze your login page

原来,这才是开发者打开世界读书日的正确姿势
随机推荐
2019-11-19解决Go test执行单个测试文件提示未定义问题
被删除的相片能恢复吗?3个技巧恢复被删除的相片
[appium stepping on the pit] could not proxy command to the remote server Original error: timeout of 240000ms exceeded
SQL command distinct
2020-10-26 go语言grpc学习
高速野蛮生长的单片机嵌入式行业,各个岗位的需求也随之增大
Better than SQL, another domestic database language was born
Project training - Design and development of 2D multiplayer fighting game based on unity (pre work knowledge of small map: camera)
LC刷题第四天
2020-09-08 去除js代码的注释
可以惊艳你的登录网页
Research on rocksdb on December 15, 2020
[Luogu] p2372 yyy2015c01 challenge perimeter (BFS)
2020-01-14 DAPP development environment construction
2020-10-26 go language grpc learning
项目实训- 基于unity的2D多人乱斗闯关游戏设计与开发(小地图工作前期知识:摄像机)
【Appium踩坑】Could not proxy command to the remote server. Original error: timeout of 240000ms exceeded
Originally, this is the correct posture for developers to open world book day
7.数据中台 --- 数据开发:数据体系建设
[spark] (task6) spark RDD completes the statistical logic