当前位置:网站首页>12-Delta Lake
12-Delta Lake
2022-04-22 19:16:00 【wangyanglongcc】
Create/Write a Delta Table
eventsDF = spark.read.parquet(eventsPath)
- Convert data to a Delta table using the schema provided by the DataFrame
Use save Save to path delta table
deltaPath = f"{
delta_path}/delta-events"
eventsDF.write.format("delta").mode("overwrite").save(deltaPath)
2. create a Delta table in the metastore
Use saveAsTable Save to table
eventsDF.write.format("delta").mode("overwrite").saveAsTable("delta_events")
3. Partition save , Use partitionBy preservation .
from pyspark.sql.functions import col
stateEventsDF = eventsDF.withColumn("state", col("geo.state"))
stateEventsDF.write.format("delta").mode("overwrite").partitionBy("state").option("overwriteSchema", "true").save(deltaPath)
Read from your Delta table
df = spark.read.format("delta").load(deltaPath)
display(df)
Access previous versions of table using Time Travel
- Use
describe history + Table nameView table version information
describe history delta_user

You can see that the table has 4 A version
-
Use
VERSION AS OF/versionAsOfGet historical version data -
Use
TIMESTAMP AS OF/timestampAsOfGet historical version data
df = spark.read.format("delta").option("versionAsOf", 0).load(deltaPath)
df = spark.read.format("delta").option("timestampAsOf", "2021-10-23T11:07:57.000+0000").load(deltaPath)
SELECT * FROM delta_events TIMESTAMP AS OF '2021-10-23T11:07:57.000+0000'
SELECT * FROM delta.`/delta_events` VERSION AS OF 0
Vacuum
we can clean up our directory using VACUUM. Vacuum accepts a retention period in hours as an input.
By default, to prevent accidentally vacuuming recent commits, Delta Lake will not let users vacuum a period under 7 days or 168 hours. Once
vacuumed, you cannot return to a prior commit through time travel, only your most recent Delta Table will be saved.
from delta.tables import *
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")
deltaTable = DeltaTable.forPath(spark, deltaPath)
deltaTable.vacuum(0)
版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221909133973.html
边栏推荐
- webrtc+turn+peerconnection_server测延时
- 团队工作原则
- Flink窗口的类型
- 欢迎页展示
- Alibaba微服务组件Sentinel介绍
- 关于字符串常量池,intern方法的理解
- What is the difference between gb8624-2012 and GB8624-2006?
- Small LED screen / digital alarm clock display screen / LED billboard / temperature digital display and other LED nixie tube display driver ic-vk1640 / 1640B sop28 / ssop24 package
- Fingerprint identification record
- LeetCode 41. Missing first positive number
猜你喜欢

mmocr DBLoss

The beautiful and comfortable kn95 mask has strong protection ability

高德Flutter官方组件amap_flutter_map在地图上画圆

final的作用以及String为什么不可变

Introduction to feign, a microservice invocation component

Error c4996 'fopen': this function or variable may be unsafe Consider using fopen_ s instead. To disabl

如何保证缓存与数据库的一致性?

mmocr DBLoss

System Analyst - paper writing framework construction

Small LED screen / digital alarm clock display screen / LED billboard / temperature digital display and other LED nixie tube display driver ic-vk1640 / 1640B sop28 / ssop24 package
随机推荐
final的作用以及String为什么不可变
08-UDFs
CMS垃圾收集器和G1垃圾收集器
Understanding of string constant pool and intern method
postman 测试 Array、List、Map 入参 API 正确姿势
什么是 SAML 断言?
创建线程的四种方式
7.数据中台 --- 数据开发:数据体系建设
Postgre creates a sequence and binds it to a table field
Incorrect string value: ‘\xF0\x9F\x92\x95\’
Can fire doors apply for BS 476-21 fire resistance test?
【面试普通人VS高手系列】请说一下网络四元组
Network security -- the use of burp suite packet capture tool
LeetCode 41. 缺失的第一个正数
The fourth day of LC brushing
One piece of data meets all data scenarios? Tencent cloud data Lake solution and DLC kernel technology introduction
Redis缓存之String的滥用
JWT token 实践问题解决
一些场景下基于MySQL比较好的实现思路(持续更新)
Linux环境下部署redis教程详解