怎么用Python实现数据的透视表 - 行业资讯 - 肥雀云

　　介绍

这篇文章给大家分享的是有关怎么用Python实现数据的透视表的内容。小编觉得挺实用的,因此分享给大家做个参考,一起跟随小编过来看看吧。

在处理数据时,经常需要对数据分组计算均值或者计数,在Microsoft Excel中,可以通过透视表轻易实现简单的分组运算。而对于更加复杂的分组运算,Python中熊猫包可以帮助我们实现。

<强> 1数据

首先引入几个重要的包:

import pandas as pd 　　import numpy as np 　　得到pandas import DataFrame系列

通过代码构造数据集:

data=https://www.yisu.com/zixun/DataFrame ({key1: [a, b, c,‘“,“c”,“‘,‘b’,‘“,“c”,“, ' b ', ' c '],“key2”:[“一”,“两个”,“三”,“两个”,“一”,“一个”,“三”,“1”,“2”,“3”,“1”,“2”),“num1”: np.random.rand (12)、“num2”: np.random.randn (12)})

得到数据集如下:

数据　　,key1 key2 num1 num2 　　0,a one 0.268705, 0.084091 　　1,b two 0.876707, 0.217794 　　2,c three 0.229999, 0.574402 　　3,a two 0.707990, -1.444415 　　4,c one 0.786064, 0.343244 　　5,a one 0.587273, 1.212391 　　6,b three 0.927396, 1.505372 　　7,a one 0.295271, -0.497633 　　8,c two 0.292721, 0.098814 　　9,a three 0.369788, -1.157426

<强> 2交叉表——分类计数

按照不同类进行计数统计是最常见透视功能,可以通

<强>(1)交叉表

#函数: 　　交叉表(列,索引,还以为;值=没有,rownames=没有,colnames=没有,aggfunc=没有,利润率=False,, dropna=True,,正常=False)

交叉表的索引和列是必须要指定复制的参数:

pd.crosstab (data.key1 data.key2)

结果如下:

key2 one three 2 　　key1 ,,,, 　　a , 3, 1, 1 　　b , 0, 1, 1 　　c ,, 1, 1, 1

想要在边框处增加汇总项可以指定距的值为真:

pd.crosstab (data.key1 data.key2,利润率=True)

结果:

key2 one three two 所有　　key1 ,,,,, 　　a , 3,, 1, 1, 5 　　b ,, 1, 1, 1, 3 　　c ,, 1,, 1, 2, 4 　　All , 5,, 3, 4, 12

<强> (2)pivot_table

函数:

pivot_table(值=没有数据,还以为;指数=没有,列=没有,aggfunc=& # 39;意味着# 39;,,fill_value=https://www.yisu.com/zixun/None,利润率=False, dropna=True, margins_name='所有')

使用pivot_table函数同样可以实现,运算函数默认值aggfunc=& # 39;意味着# 39;,指定为aggfunc=& # 39;计数# 39;即可:

data.pivot_table (& # 39; num1 # 39;,指数=& # 39;key1& # 39;,列=& # 39;key2& # 39;, aggfunc=& # 39;计数# 39;)

结果相同:

key2 one three 2 　　key1 ,,,, 　　a , 3, 1, 1 　　b , 1, 1, 1 　　c ,, 1,, 1, 2

<强> (3)groupby

通过groupby相对来说会更加复杂,首先需要对数据按照key1和key2进行聚类,然后进行运数算,再将key2的指数重塑为列:

data.groupby ((& # 39; key1& # 39; & # 39; key2& # 39;]) (& # 39; num1 # 39;] .count () .unstack ()

结果:

key2 one three 2 　　key1 ,,,, 　　a , 3, 1, 1 　　b , 1, 1, 1 　　c ,, 1,, 1, 2

<强> 3其它透视表运算

<强> (1)pivot_table

pivot_table(值=没有数据,还以为;指数=没有,列=没有,aggfunc=& # 39;意味着# 39;,,fill_value=https://www.yisu.com/zixun/None,利润率=False, dropna=True, margins_name='所有')

要进行何种运算,只需要指定aggfunc即可。

默认计算均值:

data.pivot_table(指数=& # 39;key1& # 39;,列=& # 39;key2& # 39;)

,,, num1 ,,,,,, num2 ,,,, 　　key2 , one , three , two , one three ,两个　　key1 ,,,,,,,,,,,,,, 　　a , 0.193332, 0.705657, 0.203155, -0.165749, 2.398164, -1.293595 　　b , 0.167947, 0.204545, 0.661460, 0.555850, -0.522528, 0.143530 　　c , 0.496993, 0.033673, 0.206028, -0.115093, 0.024650, 0.077726

分类汇总呢并求和:

data.pivot_table(指数=& # 39;key1& # 39;,列=& # 39;key2& # 39;, aggfunc=& # 39;和# 39;)

结果:

,,, num1 ,,,,,, num2 ,,,, 　　key2 , one , three , two , one three ,两个　　key1 ,,,,,,,,,,,,,, 　　a , 0.579996, 0.705657, 0.203155, -0.497246, 2.398164, -1.293595 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null