当前位置:网站首页>05-Aggregation
05-Aggregation
2022-04-22 04:09:00 【wangyanglongcc】
Grouping data
Use the DataFrame groupBy method to create a grouped data object
This grouped data object is called RelationalGroupedDataset in Scala and GroupedData in Python
df.groupBy("geo.state", "geo.city")

Grouped data methods
Various aggregate methods are available on the grouped data object
eventCountsDF = df.groupBy("event_name").count()
display(eventCountsDF)

cityPurchaseQuantitiesDF = df.groupBy("geo.state", "geo.city").sum("ecommerce.total_item_quantity")
display(cityPurchaseQuantitiesDF)

Built-in aggregate functions
This is the most commonly used , The most common way , Multiple values can be counted , And set alias . The above methods are too simple .
Use the grouped data method agg to apply built-in aggregate functions
This allows you to apply other transformations on the resulting columns, such as alias
from pyspark.sql.functions import avg, approx_count_distinct
stateAggregatesDF = df.groupBy("geo.state").agg(
avg("ecommerce.total_item_quantity").alias("avg_quantity"),
approx_count_distinct("user_id",0.01).alias("distinct_users"))
display(stateAggregatesDF)

版权声明
本文为[wangyanglongcc]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220408303663.html
边栏推荐
- 便利店卷疯了:便利蜂、罗森、易捷“激战”
- 机器学习系列(5)_特征工程03碳排放小案例
- tensorflow报错:returned a result with an error set解决方案
- Developing grpc application based on well-known micro service framework go micro
- 浏览器 概述本地缓存 cookie 等
- How to check whether the version of oraclejdk is charged on the official website
- How do programmers ensure that software is free of bugs?
- Sumo tutorial - Manhattan
- There is no input method after win11 system starts up - the solution is effective through personal test
- Shell programming
猜你喜欢

专家有料 | 张祖优:腾讯云DevSecOps实践与开源治理探索

搜索内容入库

2022-04-21: given a blacklist containing non repeating integers in [0, n), write a function to return a random integer not in the blacklist from [0, n), and optimize it to minimize the number of calls

Exploring Presto SQL Engine (2) - Analysis of join

Solution of stm32i2c

Machine learning series (5)_ Feature Engineering 03 small case of carbon emission

Product sharing: QT + OSG education discipline tool: Geographic 3D planet

你的指针学到什么层次了?8个指针题目让你加深对指针的理解(下)-O-

染色法判定二分图

pytorch使用profiler对模型性能分析时报错
随机推荐
容联七陌赋能企业智能化服务,重新定义客服价值
头歌答案(字符串基本操作)
See how the project manager brings a project to ruin
Determination of bipartite graph by coloring method
SR-TE Policy(思科)----补充
Do447ansible tower navigation
Rsync remote synchronization
How to generate PCB real-time snapshot in 3D in Ad
iptables使用
Class组件详解
高斯分布——在误差测量中的推导
oracle连接数据库增删改查
[perihelion force deduction] (bit operation set) addition without addition, subtraction, multiplication and division + number appearing only once + number appearing only once II + number appearing onl
英特尔边缘软件中心介绍
【机器学习】长短时记忆网络(LSTM)
LeetCode 63. 不同路径 II
L'échange et le partage des données sont au cœur de l'amélioration de l'efficacité de l'utilisation des données dans l'industrie des transports.
How do programmers ensure that software is free of bugs?
Tutorial - sumolympics
教程——sumolympics