tynbl.github.io

Pandas进阶及技巧

1. 创建Pandas

import pandas as pd

country1 = pd.Series({'Name': '中国',
                    'Language': 'Chinese',
                    'Area': '9.597M km2',
                     'Happiness Rank': 79})

country2 = pd.Series({'Name': '美国',
                    'Language': 'English (US)',
                    'Area': '9.834M km2',
                     'Happiness Rank': 14})

country3 = pd.Series({'Name': '澳大利亚',
                    'Language': 'English (AU)',
                    'Area': '7.692M km2',
                     'Happiness Rank': 9})

df = pd.DataFrame([country1, country2, country3], index=['CH', 'US', 'AU'])
# 注意在jupyter中使用print和不使用print的区别
print(df)
df
          Area  Happiness Rank      Language  Name
CH  9.597M km2              79       Chinese    中国
US  9.834M km2              14  English (US)    美国
AU  7.692M km2               9  English (AU)  澳大利亚
Area Happiness Rank Language Name
CH 9.597M km2 79 Chinese 中国
US 9.834M km2 14 English (US) 美国
AU 7.692M km2 9 English (AU) 澳大利亚
# 添加数据
# 如果个数小于要求的个数,会自动进行“广播”操作
# 如果大于要求的个数,会报错
df['Location'] = '地球'
print(df)

df['Region'] = ['亚洲', '北美洲', '大洋洲']
print(df)
df
          Area  Happiness Rank      Language  Name Location
CH  9.597M km2              79       Chinese    中国       地球
US  9.834M km2              14  English (US)    美国       地球
AU  7.692M km2               9  English (AU)  澳大利亚       地球
          Area  Happiness Rank      Language  Name Location Region
CH  9.597M km2              79       Chinese    中国       地球     亚洲
US  9.834M km2              14  English (US)    美国       地球    北美洲
AU  7.692M km2               9  English (AU)  澳大利亚       地球    大洋洲
Area Happiness Rank Language Name Location Region
CH 9.597M km2 79 Chinese 中国 地球 亚洲
US 9.834M km2 14 English (US) 美国 地球 北美洲
AU 7.692M km2 9 English (AU) 澳大利亚 地球 大洋洲

2. Pandas索引

# 行索引
print('loc:')
print(df.loc['CH'])
print(type(df.loc['CH']))

print('iloc:')
print(df.iloc[1])
loc:
Area              9.597M km2
Happiness Rank            79
Language             Chinese
Name                      中国
Location                  地球
Region                    亚洲
Name: CH, dtype: object
<class 'pandas.core.series.Series'>
iloc:
Area                9.834M km2
Happiness Rank              14
Language          English (US)
Name                        美国
Location                    地球
Region                     北美洲
Name: US, dtype: object
# 列索引
print(df['Area'])
print(type(df['Area']))
CH    9.597M km2
US    9.834M km2
AU    7.692M km2
Name: Area, dtype: object
<class 'pandas.core.series.Series'>
# 获取不连续的列数据
print(df[['Name', 'Area']])
    Name        Area
CH    中国  9.597M km2
US    美国  9.834M km2
AU  澳大利亚  7.692M km2
# 混合索引
# 注意写法上的区别
print('先取出列,再取行:')
print(df['Area']['CH'])
print(df['Area'].loc['CH'])
print(df['Area'].iloc[0])

print('先取出行,再取列:')
print(df.loc['CH']['Area'])
print(df.iloc[0]['Area'])
先取出列,再取行:
9.597M km2
9.597M km2
9.597M km2
先取出行,再取列:
9.597M km2
9.597M km2
# 转换行和列
print(df.T)
                        CH            US            AU
Area            9.597M km2    9.834M km2    7.692M km2
Happiness Rank          79            14             9
Language           Chinese  English (US)  English (AU)
Name                    中国            美国          澳大利亚
Location                地球            地球            地球
Region                  亚洲           北美洲           大洋洲

3. 删除数据

print(df.drop(['CH']))
# 注意drop操作只是将修改后的数据copy一份,而不会对原始数据进行修改
print(df)
          Area  Happiness Rank      Language  Name Location Region
US  9.834M km2              14  English (US)    美国       地球    北美洲
AU  7.692M km2               9  English (AU)  澳大利亚       地球    大洋洲
          Area  Happiness Rank      Language  Name Location Region
CH  9.597M km2              79       Chinese    中国       地球     亚洲
US  9.834M km2              14  English (US)    美国       地球    北美洲
AU  7.692M km2               9  English (AU)  澳大利亚       地球    大洋洲
print(df.drop(['CH'], inplace=True))
# 如果使用了inplace=True,会在原始数据上进行修改,同时不会返回一个copy
print(df)
None
          Area  Happiness Rank      Language  Name Location Region
US  9.834M km2              14  English (US)    美国       地球    北美洲
AU  7.692M km2               9  English (AU)  澳大利亚       地球    大洋洲
#  如果需要删除列,需要指定axis=1
print(df.drop(['Area'], axis=1))
print(df)
    Happiness Rank      Language  Name Location Region
US              14  English (US)    美国       地球    北美洲
AU               9  English (AU)  澳大利亚       地球    大洋洲
          Area  Happiness Rank      Language  Name Location Region
US  9.834M km2              14  English (US)    美国       地球    北美洲
AU  7.692M km2               9  English (AU)  澳大利亚       地球    大洋洲
# 也可直接使用del关键字
del df['Name']
print(df)
          Area  Happiness Rank      Language Location Region
US  9.834M km2              14  English (US)       地球    北美洲
AU  7.692M km2               9  English (AU)       地球    大洋洲

4. DataFrame的操作与加载

# 注意从DataFrame中取出的数据进行操作后,会对原始数据产生影响
ranks = df['Happiness Rank']
ranks += 2
print(ranks)
print(df)
US    16
AU    11
Name: Happiness Rank, dtype: int64
          Area  Happiness Rank      Language Location Region
US  9.834M km2              16  English (US)       地球    北美洲
AU  7.692M km2              11  English (AU)       地球    大洋洲
# 注意从DataFrame中取出的数据进行操作后,会对原始数据产生影响
# 安全的操作是使用copy()
ranks = df['Happiness Rank'].copy()
ranks += 2
print(ranks)
print(df)
US    18
AU    13
Name: Happiness Rank, dtype: int64
          Area  Happiness Rank      Language Location Region
US  9.834M km2              16  English (US)       地球    北美洲
AU  7.692M km2              11  English (AU)       地球    大洋洲
# 加载csv文件数据
reprot_2015_df = pd.read_csv('./2015.csv')
print('2015年数据预览:')
#print(reprot_2015_df.head())
reprot_2015_df.head()
2015年数据预览:
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176
print(reprot_2015_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
Country                          158 non-null object
Region                           158 non-null object
Happiness Rank                   158 non-null int64
Happiness Score                  158 non-null float64
Standard Error                   158 non-null float64
Economy (GDP per Capita)         158 non-null float64
Family                           158 non-null float64
Health (Life Expectancy)         158 non-null float64
Freedom                          158 non-null float64
Trust (Government Corruption)    158 non-null float64
Generosity                       158 non-null float64
Dystopia Residual                158 non-null float64
dtypes: float64(9), int64(1), object(2)
memory usage: 14.9+ KB
None
# 使用index_col指定索引列
# 使用usecols指定需要读取的列
reprot_2016_df = pd.read_csv('./2016.csv', 
                             index_col='Country',
                             usecols=['Country', 'Happiness Rank', 'Happiness Score', 'Region'])
# 数据预览
reprot_2016_df.head()
Region Happiness Rank Happiness Score
Country
Denmark Western Europe 1 7.526
Switzerland Western Europe 2 7.509
Iceland Western Europe 3 7.501
Norway Western Europe 4 7.498
Finland Western Europe 5 7.413
print('列名(column):', reprot_2016_df.columns)
print('行名(index):', reprot_2016_df.index)
列名(column): Index(['Region', 'Happiness Rank', 'Happiness Score'], dtype='object')
行名(index): Index(['Denmark', 'Switzerland', 'Iceland', 'Norway', 'Finland', 'Canada',
       'Netherlands', 'New Zealand', 'Australia', 'Sweden',
       ...
       'Madagascar', 'Tanzania', 'Liberia', 'Guinea', 'Rwanda', 'Benin',
       'Afghanistan', 'Togo', 'Syria', 'Burundi'],
      dtype='object', name='Country', length=157)
# 注意index是不可变的
reprot_2016_df.index[0] = '丹麦'
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-19-fe3f4b6af8cf> in <module>()
      1 # 注意index是不可变的
----> 2 reprot_2016_df.index[0] = '丹麦'


D:\Users\Jimmy\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   1618 
   1619     def __setitem__(self, key, value):
-> 1620         raise TypeError("Index does not support mutable operations")
   1621 
   1622     def __getitem__(self, key):


TypeError: Index does not support mutable operations
# 重置index
# 注意inplace加与不加的区别
reprot_2016_df.reset_index().head()
Country Region Happiness Rank Happiness Score
0 Denmark Western Europe 1 7.526
1 Switzerland Western Europe 2 7.509
2 Iceland Western Europe 3 7.501
3 Norway Western Europe 4 7.498
4 Finland Western Europe 5 7.413
# 重命名列名
reprot_2016_df.rename(columns={'Region': '地区', 'Hapiness Rank': '排名', 'Hapiness Score': '幸福指数'})
reprot_2016_df.head()
Region Happiness Rank Happiness Score
Country
Denmark Western Europe 1 7.526
Switzerland Western Europe 2 7.509
Iceland Western Europe 3 7.501
Norway Western Europe 4 7.498
Finland Western Europe 5 7.413
# 重命名列名,注意inplace的使用
reprot_2016_df.rename(columns={'Region': '地区', 'Happiness Rank': '排名', 'Happiness Score': '幸福指数'},
                     inplace=True)
reprot_2016_df.head()
地区 排名 幸福指数
Country
Denmark Western Europe 1 7.526
Switzerland Western Europe 2 7.509
Iceland Western Europe 3 7.501
Norway Western Europe 4 7.498
Finland Western Europe 5 7.413

5. Boolean Mask

# 过滤 Western Europe 地区的国家
only_western_europe = reprot_2016_df['地区'] == 'Western Europe'
only_western_europe
Country
Denmark                  True
Switzerland              True
Iceland                  True
Norway                   True
Finland                  True
Canada                  False
Netherlands              True
New Zealand             False
Australia               False
Sweden                   True
Israel                  False
Austria                  True
United States           False
Costa Rica              False
Puerto Rico             False
Germany                  True
Brazil                  False
Belgium                  True
Ireland                  True
Luxembourg               True
Mexico                  False
Singapore               False
United Kingdom           True
Chile                   False
Panama                  False
Argentina               False
Czech Republic          False
United Arab Emirates    False
Uruguay                 False
Malta                    True
                        ...  
Senegal                 False
Bulgaria                False
Mauritania              False
Zimbabwe                False
Malawi                  False
Sudan                   False
Gabon                   False
Mali                    False
Haiti                   False
Botswana                False
Comoros                 False
Ivory Coast             False
Cambodia                False
Angola                  False
Niger                   False
South Sudan             False
Chad                    False
Burkina Faso            False
Uganda                  False
Yemen                   False
Madagascar              False
Tanzania                False
Liberia                 False
Guinea                  False
Rwanda                  False
Benin                   False
Afghanistan             False
Togo                    False
Syria                   False
Burundi                 False
Name: 地区, Length: 157, dtype: bool
# 过滤 Western Europe 地区的国家
# 并且排名在10之外
only_western_europe_10 = (reprot_2016_df['地区'] == 'Western Europe') & (reprot_2016_df['排名'] > 10)
only_western_europe_10
Country
Denmark                 False
Switzerland             False
Iceland                 False
Norway                  False
Finland                 False
Canada                  False
Netherlands             False
New Zealand             False
Australia               False
Sweden                  False
Israel                  False
Austria                  True
United States           False
Costa Rica              False
Puerto Rico             False
Germany                  True
Brazil                  False
Belgium                  True
Ireland                  True
Luxembourg               True
Mexico                  False
Singapore               False
United Kingdom           True
Chile                   False
Panama                  False
Argentina               False
Czech Republic          False
United Arab Emirates    False
Uruguay                 False
Malta                    True
                        ...  
Senegal                 False
Bulgaria                False
Mauritania              False
Zimbabwe                False
Malawi                  False
Sudan                   False
Gabon                   False
Mali                    False
Haiti                   False
Botswana                False
Comoros                 False
Ivory Coast             False
Cambodia                False
Angola                  False
Niger                   False
South Sudan             False
Chad                    False
Burkina Faso            False
Uganda                  False
Yemen                   False
Madagascar              False
Tanzania                False
Liberia                 False
Guinea                  False
Rwanda                  False
Benin                   False
Afghanistan             False
Togo                    False
Syria                   False
Burundi                 False
Length: 157, dtype: bool
# 叠加 boolean mask 得到最终结果
reprot_2016_df[only_western_europe_10]
地区 排名 幸福指数
Country
Austria Western Europe 12 7.119
Germany Western Europe 16 6.994
Belgium Western Europe 18 6.929
Ireland Western Europe 19 6.907
Luxembourg Western Europe 20 6.871
United Kingdom Western Europe 23 6.725
Malta Western Europe 30 6.488
France Western Europe 32 6.478
Spain Western Europe 37 6.361
Italy Western Europe 50 5.977
North Cyprus Western Europe 62 5.771
Cyprus Western Europe 69 5.546
Portugal Western Europe 94 5.123
Greece Western Europe 99 5.033
# 熟练以后可以写在一行中
reprot_2016_df[(reprot_2016_df['地区'] == 'Western Europe') & (reprot_2016_df['排名'] > 10)]
地区 排名 幸福指数
Country
Austria Western Europe 12 7.119
Germany Western Europe 16 6.994
Belgium Western Europe 18 6.929
Ireland Western Europe 19 6.907
Luxembourg Western Europe 20 6.871
United Kingdom Western Europe 23 6.725
Malta Western Europe 30 6.488
France Western Europe 32 6.478
Spain Western Europe 37 6.361
Italy Western Europe 50 5.977
North Cyprus Western Europe 62 5.771
Cyprus Western Europe 69 5.546
Portugal Western Europe 94 5.123
Greece Western Europe 99 5.033

6. 层级索引

reprot_2015_df.head()
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176
# 设置层级索引
report_2015_df2 = reprot_2015_df.set_index(['Region', 'Country'])
report_2015_df2.head(20)
Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Region Country
Western Europe Switzerland 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
Iceland 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
Denmark 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
Norway 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
North America Canada 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176
Western Europe Finland 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955
Netherlands 7 7.378 0.02799 1.32944 1.28017 0.89284 0.61576 0.31814 0.47610 2.46570
Sweden 8 7.364 0.03157 1.33171 1.28907 0.91087 0.65980 0.43844 0.36262 2.37119
Australia and New Zealand New Zealand 9 7.286 0.03371 1.25018 1.31967 0.90837 0.63938 0.42922 0.47501 2.26425
Australia 10 7.284 0.04083 1.33358 1.30923 0.93156 0.65124 0.35637 0.43562 2.26646
Middle East and Northern Africa Israel 11 7.278 0.03470 1.22857 1.22393 0.91387 0.41319 0.07785 0.33172 3.08854
Latin America and Caribbean Costa Rica 12 7.226 0.04454 0.95578 1.23788 0.86027 0.63376 0.10583 0.25497 3.17728
Western Europe Austria 13 7.200 0.03751 1.33723 1.29704 0.89042 0.62433 0.18676 0.33088 2.53320
Latin America and Caribbean Mexico 14 7.187 0.04176 1.02054 0.91451 0.81444 0.48181 0.21312 0.14074 3.60214
North America United States 15 7.119 0.03839 1.39451 1.24711 0.86179 0.54604 0.15890 0.40105 2.51011
Latin America and Caribbean Brazil 16 6.983 0.04076 0.98124 1.23287 0.69702 0.49049 0.17521 0.14574 3.26001
Western Europe Luxembourg 17 6.946 0.03499 1.56391 1.21963 0.91894 0.61583 0.37798 0.28034 1.96961
Ireland 18 6.940 0.03676 1.33596 1.36948 0.89533 0.61777 0.28703 0.45901 1.97570
Belgium 19 6.937 0.03595 1.30782 1.28566 0.89667 0.58450 0.22540 0.22250 2.41484
Middle East and Northern Africa United Arab Emirates 20 6.901 0.03729 1.42727 1.12575 0.80925 0.64157 0.38583 0.26428 2.24743
# level0 索引
report_2015_df2.loc['Western Europe']
Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Country
Switzerland 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
Iceland 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
Denmark 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
Norway 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
Finland 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955
Netherlands 7 7.378 0.02799 1.32944 1.28017 0.89284 0.61576 0.31814 0.47610 2.46570
Sweden 8 7.364 0.03157 1.33171 1.28907 0.91087 0.65980 0.43844 0.36262 2.37119
Austria 13 7.200 0.03751 1.33723 1.29704 0.89042 0.62433 0.18676 0.33088 2.53320
Luxembourg 17 6.946 0.03499 1.56391 1.21963 0.91894 0.61583 0.37798 0.28034 1.96961
Ireland 18 6.940 0.03676 1.33596 1.36948 0.89533 0.61777 0.28703 0.45901 1.97570
Belgium 19 6.937 0.03595 1.30782 1.28566 0.89667 0.58450 0.22540 0.22250 2.41484
United Kingdom 21 6.867 0.01866 1.26637 1.28548 0.90943 0.59625 0.32067 0.51912 1.96994
Germany 26 6.750 0.01848 1.32792 1.29937 0.89186 0.61477 0.21843 0.28214 2.11569
France 29 6.575 0.03512 1.27778 1.26038 0.94579 0.55011 0.20646 0.12332 2.21126
Spain 36 6.329 0.03468 1.23011 1.31379 0.95562 0.45951 0.06398 0.18227 2.12367
Malta 37 6.302 0.04206 1.20740 1.30203 0.88721 0.60365 0.13586 0.51752 1.64880
Italy 50 5.948 0.03914 1.25114 1.19777 0.95446 0.26236 0.02901 0.22823 2.02518
North Cyprus 66 5.695 0.05635 1.20806 1.07008 0.92356 0.49027 0.14280 0.26169 1.59888
Cyprus 67 5.689 0.05580 1.20813 0.89318 0.92356 0.40672 0.06146 0.30638 1.88931
Portugal 88 5.102 0.04802 1.15991 1.13935 0.87519 0.51469 0.01078 0.13719 1.26462
Greece 102 4.857 0.05062 1.15406 0.92933 0.88213 0.07699 0.01397 0.00000 1.80101
# 两层索引
report_2015_df2.loc['Western Europe', 'Switzerland']
Happiness Rank                   1.00000
Happiness Score                  7.58700
Standard Error                   0.03411
Economy (GDP per Capita)         1.39651
Family                           1.34951
Health (Life Expectancy)         0.94143
Freedom                          0.66557
Trust (Government Corruption)    0.41978
Generosity                       0.29678
Dystopia Residual                2.51738
Name: (Western Europe, Switzerland), dtype: float64
# 交换分层顺序
report_2015_df2.swaplevel()
Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Country Region
Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176
Finland Western Europe 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955
Netherlands Western Europe 7 7.378 0.02799 1.32944 1.28017 0.89284 0.61576 0.31814 0.47610 2.46570
Sweden Western Europe 8 7.364 0.03157 1.33171 1.28907 0.91087 0.65980 0.43844 0.36262 2.37119
New Zealand Australia and New Zealand 9 7.286 0.03371 1.25018 1.31967 0.90837 0.63938 0.42922 0.47501 2.26425
Australia Australia and New Zealand 10 7.284 0.04083 1.33358 1.30923 0.93156 0.65124 0.35637 0.43562 2.26646
Israel Middle East and Northern Africa 11 7.278 0.03470 1.22857 1.22393 0.91387 0.41319 0.07785 0.33172 3.08854
Costa Rica Latin America and Caribbean 12 7.226 0.04454 0.95578 1.23788 0.86027 0.63376 0.10583 0.25497 3.17728
Austria Western Europe 13 7.200 0.03751 1.33723 1.29704 0.89042 0.62433 0.18676 0.33088 2.53320
Mexico Latin America and Caribbean 14 7.187 0.04176 1.02054 0.91451 0.81444 0.48181 0.21312 0.14074 3.60214
United States North America 15 7.119 0.03839 1.39451 1.24711 0.86179 0.54604 0.15890 0.40105 2.51011
Brazil Latin America and Caribbean 16 6.983 0.04076 0.98124 1.23287 0.69702 0.49049 0.17521 0.14574 3.26001
Luxembourg Western Europe 17 6.946 0.03499 1.56391 1.21963 0.91894 0.61583 0.37798 0.28034 1.96961
Ireland Western Europe 18 6.940 0.03676 1.33596 1.36948 0.89533 0.61777 0.28703 0.45901 1.97570
Belgium Western Europe 19 6.937 0.03595 1.30782 1.28566 0.89667 0.58450 0.22540 0.22250 2.41484
United Arab Emirates Middle East and Northern Africa 20 6.901 0.03729 1.42727 1.12575 0.80925 0.64157 0.38583 0.26428 2.24743
United Kingdom Western Europe 21 6.867 0.01866 1.26637 1.28548 0.90943 0.59625 0.32067 0.51912 1.96994
Oman Middle East and Northern Africa 22 6.853 0.05335 1.36011 1.08182 0.76276 0.63274 0.32524 0.21542 2.47489
Venezuela Latin America and Caribbean 23 6.810 0.06476 1.04424 1.25596 0.72052 0.42908 0.11069 0.05841 3.19131
Singapore Southeastern Asia 24 6.798 0.03780 1.52186 1.02000 1.02525 0.54252 0.49210 0.31105 1.88501
Panama Latin America and Caribbean 25 6.786 0.04910 1.06353 1.19850 0.79661 0.54210 0.09270 0.24434 2.84848
Germany Western Europe 26 6.750 0.01848 1.32792 1.29937 0.89186 0.61477 0.21843 0.28214 2.11569
Chile Latin America and Caribbean 27 6.670 0.05800 1.10715 1.12447 0.85857 0.44132 0.12869 0.33363 2.67585
Qatar Middle East and Northern Africa 28 6.611 0.06257 1.69042 1.07860 0.79733 0.64040 0.52208 0.32573 1.55674
France Western Europe 29 6.575 0.03512 1.27778 1.26038 0.94579 0.55011 0.20646 0.12332 2.21126
Argentina Latin America and Caribbean 30 6.574 0.04612 1.05351 1.24823 0.78723 0.44974 0.08484 0.11451 2.83600
... ... ... ... ... ... ... ... ... ... ... ...
Myanmar Southeastern Asia 129 4.307 0.04351 0.27108 0.70905 0.48246 0.44017 0.19034 0.79588 1.41805
Georgia Central and Eastern Europe 130 4.297 0.04221 0.74190 0.38562 0.72926 0.40577 0.38331 0.05547 1.59541
Malawi Sub-Saharan Africa 131 4.292 0.06130 0.01604 0.41134 0.22562 0.43054 0.06977 0.33128 2.80791
Sri Lanka Southern Asia 132 4.271 0.03751 0.83524 1.01905 0.70806 0.53726 0.09179 0.40828 0.67108
Cameroon Sub-Saharan Africa 133 4.252 0.04678 0.42250 0.88767 0.23402 0.49309 0.05786 0.20618 1.95071
Bulgaria Central and Eastern Europe 134 4.218 0.04828 1.01216 1.10614 0.76649 0.30587 0.00872 0.11921 0.89991
Egypt Middle East and Northern Africa 135 4.194 0.03260 0.88180 0.74700 0.61712 0.17288 0.06324 0.11291 1.59927
Yemen Middle East and Northern Africa 136 4.077 0.04367 0.54649 0.68093 0.40064 0.35571 0.07854 0.09131 1.92313
Angola Sub-Saharan Africa 137 4.033 0.04758 0.75778 0.86040 0.16683 0.10384 0.07122 0.12344 1.94939
Mali Sub-Saharan Africa 138 3.995 0.05602 0.26074 1.03526 0.20583 0.38857 0.12352 0.18798 1.79293
Congo (Brazzaville) Sub-Saharan Africa 139 3.989 0.06682 0.67866 0.66290 0.31051 0.41466 0.11686 0.12388 1.68135
Comoros Sub-Saharan Africa 140 3.956 0.04797 0.23906 0.79273 0.36315 0.22917 0.19900 0.17441 1.95812
Uganda Sub-Saharan Africa 141 3.931 0.04317 0.21102 1.13299 0.33861 0.45727 0.07267 0.29066 1.42766
Senegal Sub-Saharan Africa 142 3.904 0.03608 0.36498 0.97619 0.43540 0.36772 0.10713 0.20843 1.44395
Gabon Sub-Saharan Africa 143 3.896 0.04547 1.06024 0.90528 0.43372 0.31914 0.11091 0.06822 0.99895
Niger Sub-Saharan Africa 144 3.845 0.03602 0.06940 0.77265 0.29707 0.47692 0.15639 0.19387 1.87877
Cambodia Southeastern Asia 145 3.819 0.05069 0.46038 0.62736 0.61114 0.66246 0.07247 0.40359 0.98195
Tanzania Sub-Saharan Africa 146 3.781 0.05061 0.28520 1.00268 0.38215 0.32878 0.05747 0.34377 1.38079
Madagascar Sub-Saharan Africa 147 3.681 0.03633 0.20824 0.66801 0.46721 0.19184 0.08124 0.21333 1.85100
Central African Republic Sub-Saharan Africa 148 3.678 0.06112 0.07850 0.00000 0.06699 0.48879 0.08289 0.23835 2.72230
Chad Sub-Saharan Africa 149 3.667 0.03830 0.34193 0.76062 0.15010 0.23501 0.05269 0.18386 1.94296
Guinea Sub-Saharan Africa 150 3.656 0.03590 0.17417 0.46475 0.24009 0.37725 0.12139 0.28657 1.99172
Ivory Coast Sub-Saharan Africa 151 3.655 0.05141 0.46534 0.77115 0.15185 0.46866 0.17922 0.20165 1.41723
Burkina Faso Sub-Saharan Africa 152 3.587 0.04324 0.25812 0.85188 0.27125 0.39493 0.12832 0.21747 1.46494
Afghanistan Southern Asia 153 3.575 0.03084 0.31982 0.30285 0.30335 0.23414 0.09719 0.36510 1.95210
Rwanda Sub-Saharan Africa 154 3.465 0.03464 0.22208 0.77370 0.42864 0.59201 0.55191 0.22628 0.67042
Benin Sub-Saharan Africa 155 3.340 0.03656 0.28665 0.35386 0.31910 0.48450 0.08010 0.18260 1.63328
Syria Middle East and Northern Africa 156 3.006 0.05015 0.66320 0.47489 0.72193 0.15684 0.18906 0.47179 0.32858
Burundi Sub-Saharan Africa 157 2.905 0.08658 0.01530 0.41587 0.22396 0.11850 0.10062 0.19727 1.83302
Togo Sub-Saharan Africa 158 2.839 0.06727 0.20868 0.13995 0.28443 0.36453 0.10731 0.16681 1.56726

158 rows × 10 columns

# 排序分层
report_2015_df2.sort_index(level=0)
Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Region Country
Australia and New Zealand Australia 10 7.284 0.04083 1.33358 1.30923 0.93156 0.65124 0.35637 0.43562 2.26646
New Zealand 9 7.286 0.03371 1.25018 1.31967 0.90837 0.63938 0.42922 0.47501 2.26425
Central and Eastern Europe Albania 95 4.959 0.05013 0.87867 0.80434 0.81325 0.35733 0.06413 0.14272 1.89894
Armenia 127 4.350 0.04763 0.76821 0.77711 0.72990 0.19847 0.03900 0.07855 1.75873
Azerbaijan 80 5.212 0.03363 1.02389 0.93793 0.64045 0.37030 0.16065 0.07799 2.00073
Belarus 59 5.813 0.03938 1.03192 1.23289 0.73608 0.37938 0.19090 0.11046 2.13090
Bosnia and Herzegovina 96 4.949 0.06913 0.83223 0.91916 0.79081 0.09245 0.00227 0.24808 2.06367
Bulgaria 134 4.218 0.04828 1.01216 1.10614 0.76649 0.30587 0.00872 0.11921 0.89991
Croatia 62 5.759 0.04394 1.08254 0.79624 0.78805 0.25883 0.02430 0.05444 2.75414
Czech Republic 31 6.505 0.04168 1.17898 1.20643 0.84483 0.46364 0.02652 0.10686 2.67782
Estonia 73 5.429 0.04013 1.15174 1.22791 0.77361 0.44888 0.15184 0.08680 1.58782
Georgia 130 4.297 0.04221 0.74190 0.38562 0.72926 0.40577 0.38331 0.05547 1.59541
Hungary 104 4.800 0.06107 1.12094 1.20215 0.75905 0.32112 0.02758 0.12800 1.24074
Kazakhstan 54 5.855 0.04114 1.12254 1.12241 0.64368 0.51649 0.08454 0.11827 2.24729
Kosovo 69 5.589 0.05018 0.80148 0.81198 0.63132 0.24749 0.04741 0.28310 2.76579
Kyrgyzstan 77 5.286 0.03823 0.47428 1.15115 0.65088 0.43477 0.04232 0.30030 2.23270
Latvia 89 5.098 0.04640 1.11312 1.09562 0.72437 0.29671 0.06332 0.18226 1.62215
Lithuania 56 5.833 0.03843 1.14723 1.25745 0.73128 0.21342 0.01031 0.02641 2.44649
Macedonia 93 5.007 0.05376 0.91851 1.00232 0.73545 0.33457 0.05327 0.22359 1.73933
Moldova 52 5.889 0.03799 0.59448 1.01528 0.61826 0.32818 0.01615 0.20951 3.10712
Montenegro 82 5.192 0.05235 0.97438 0.90557 0.72521 0.18260 0.14296 0.16140 2.10017
Poland 60 5.791 0.04263 1.12555 1.27948 0.77903 0.53122 0.04212 0.16759 1.86565
Romania 86 5.124 0.06607 1.04345 0.88588 0.76890 0.35068 0.00649 0.13748 1.93129
Russia 64 5.716 0.03135 1.13764 1.23617 0.66926 0.36679 0.03005 0.00199 2.27394
Serbia 87 5.123 0.04864 0.92053 1.00964 0.74836 0.20107 0.02617 0.19231 2.02500
Slovakia 45 5.995 0.04267 1.16891 1.26999 0.78902 0.31751 0.03431 0.16893 2.24639
Slovenia 55 5.848 0.04251 1.18498 1.27385 0.87337 0.60855 0.03787 0.25328 1.61583
Tajikistan 106 4.786 0.03198 0.39047 0.85563 0.57379 0.47216 0.15072 0.22974 2.11399
Turkmenistan 70 5.548 0.04175 0.95847 1.22668 0.53886 0.47610 0.30844 0.16979 1.86984
Ukraine 111 4.681 0.04412 0.79907 1.20278 0.67390 0.25123 0.02961 0.15275 1.57140
... ... ... ... ... ... ... ... ... ... ... ...
Sub-Saharan Africa Somaliland region 91 5.057 0.06161 0.18847 0.95152 0.43873 0.46582 0.39928 0.50318 2.11032
South Africa 113 4.642 0.04585 0.92049 1.18468 0.27688 0.33207 0.08884 0.11973 1.71956
Sudan 118 4.550 0.06740 0.52107 1.01404 0.36878 0.10081 0.14660 0.19062 2.20857
Swaziland 101 4.867 0.08742 0.71206 1.07284 0.07566 0.30658 0.03060 0.18259 2.48676
Tanzania 146 3.781 0.05061 0.28520 1.00268 0.38215 0.32878 0.05747 0.34377 1.38079
Togo 158 2.839 0.06727 0.20868 0.13995 0.28443 0.36453 0.10731 0.16681 1.56726
Uganda 141 3.931 0.04317 0.21102 1.13299 0.33861 0.45727 0.07267 0.29066 1.42766
Zambia 85 5.129 0.06988 0.47038 0.91612 0.29924 0.48827 0.12468 0.19591 2.63430
Zimbabwe 115 4.610 0.04290 0.27100 1.03276 0.33475 0.25861 0.08079 0.18987 2.44191
Western Europe Austria 13 7.200 0.03751 1.33723 1.29704 0.89042 0.62433 0.18676 0.33088 2.53320
Belgium 19 6.937 0.03595 1.30782 1.28566 0.89667 0.58450 0.22540 0.22250 2.41484
Cyprus 67 5.689 0.05580 1.20813 0.89318 0.92356 0.40672 0.06146 0.30638 1.88931
Denmark 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
Finland 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955
France 29 6.575 0.03512 1.27778 1.26038 0.94579 0.55011 0.20646 0.12332 2.21126
Germany 26 6.750 0.01848 1.32792 1.29937 0.89186 0.61477 0.21843 0.28214 2.11569
Greece 102 4.857 0.05062 1.15406 0.92933 0.88213 0.07699 0.01397 0.00000 1.80101
Iceland 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
Ireland 18 6.940 0.03676 1.33596 1.36948 0.89533 0.61777 0.28703 0.45901 1.97570
Italy 50 5.948 0.03914 1.25114 1.19777 0.95446 0.26236 0.02901 0.22823 2.02518
Luxembourg 17 6.946 0.03499 1.56391 1.21963 0.91894 0.61583 0.37798 0.28034 1.96961
Malta 37 6.302 0.04206 1.20740 1.30203 0.88721 0.60365 0.13586 0.51752 1.64880
Netherlands 7 7.378 0.02799 1.32944 1.28017 0.89284 0.61576 0.31814 0.47610 2.46570
North Cyprus 66 5.695 0.05635 1.20806 1.07008 0.92356 0.49027 0.14280 0.26169 1.59888
Norway 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
Portugal 88 5.102 0.04802 1.15991 1.13935 0.87519 0.51469 0.01078 0.13719 1.26462
Spain 36 6.329 0.03468 1.23011 1.31379 0.95562 0.45951 0.06398 0.18227 2.12367
Sweden 8 7.364 0.03157 1.33171 1.28907 0.91087 0.65980 0.43844 0.36262 2.37119
Switzerland 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
United Kingdom 21 6.867 0.01866 1.26637 1.28548 0.90943 0.59625 0.32067 0.51912 1.96994

158 rows × 10 columns

7. 数据清洗

log_data = pd.read_csv('log.csv')
log_data
time user video playback position paused volume
0 1469974424 cheryl intro.html 5 False 10.0
1 1469974454 cheryl intro.html 6 NaN NaN
2 1469974544 cheryl intro.html 9 NaN NaN
3 1469974574 cheryl intro.html 10 NaN NaN
4 1469977514 bob intro.html 1 NaN NaN
5 1469977544 bob intro.html 1 NaN NaN
6 1469977574 bob intro.html 1 NaN NaN
7 1469977604 bob intro.html 1 NaN NaN
8 1469974604 cheryl intro.html 11 NaN NaN
9 1469974694 cheryl intro.html 14 NaN NaN
10 1469974724 cheryl intro.html 15 NaN NaN
11 1469974454 sue advanced.html 24 NaN NaN
12 1469974524 sue advanced.html 25 NaN NaN
13 1469974424 sue advanced.html 23 False 10.0
14 1469974554 sue advanced.html 26 NaN NaN
15 1469974624 sue advanced.html 27 NaN NaN
16 1469974654 sue advanced.html 28 NaN 5.0
17 1469974724 sue advanced.html 29 NaN NaN
18 1469974484 cheryl intro.html 7 NaN NaN
19 1469974514 cheryl intro.html 8 NaN NaN
20 1469974754 sue advanced.html 30 NaN NaN
21 1469974824 sue advanced.html 31 NaN NaN
22 1469974854 sue advanced.html 32 NaN NaN
23 1469974924 sue advanced.html 33 NaN NaN
24 1469977424 bob intro.html 1 True 10.0
25 1469977454 bob intro.html 1 NaN NaN
26 1469977484 bob intro.html 1 NaN NaN
27 1469977634 bob intro.html 1 NaN NaN
28 1469977664 bob intro.html 1 NaN NaN
29 1469974634 cheryl intro.html 12 NaN NaN
30 1469974664 cheryl intro.html 13 NaN NaN
31 1469977694 bob intro.html 1 NaN NaN
32 1469977724 bob intro.html 1 NaN NaN
log_data.set_index(['time', 'user'], inplace=True)
log_data.sort_index(inplace=True)
log_data
video playback position paused volume
time user
1469974424 cheryl intro.html 5 False 10.0
sue advanced.html 23 False 10.0
1469974454 cheryl intro.html 6 NaN NaN
sue advanced.html 24 NaN NaN
1469974484 cheryl intro.html 7 NaN NaN
1469974514 cheryl intro.html 8 NaN NaN
1469974524 sue advanced.html 25 NaN NaN
1469974544 cheryl intro.html 9 NaN NaN
1469974554 sue advanced.html 26 NaN NaN
1469974574 cheryl intro.html 10 NaN NaN
1469974604 cheryl intro.html 11 NaN NaN
1469974624 sue advanced.html 27 NaN NaN
1469974634 cheryl intro.html 12 NaN NaN
1469974654 sue advanced.html 28 NaN 5.0
1469974664 cheryl intro.html 13 NaN NaN
1469974694 cheryl intro.html 14 NaN NaN
1469974724 cheryl intro.html 15 NaN NaN
sue advanced.html 29 NaN NaN
1469974754 sue advanced.html 30 NaN NaN
1469974824 sue advanced.html 31 NaN NaN
1469974854 sue advanced.html 32 NaN NaN
1469974924 sue advanced.html 33 NaN NaN
1469977424 bob intro.html 1 True 10.0
1469977454 bob intro.html 1 NaN NaN
1469977484 bob intro.html 1 NaN NaN
1469977514 bob intro.html 1 NaN NaN
1469977544 bob intro.html 1 NaN NaN
1469977574 bob intro.html 1 NaN NaN
1469977604 bob intro.html 1 NaN NaN
1469977634 bob intro.html 1 NaN NaN
1469977664 bob intro.html 1 NaN NaN
1469977694 bob intro.html 1 NaN NaN
1469977724 bob intro.html 1 NaN NaN
log_data.fillna(0)
video playback position paused volume
time user
1469974424 cheryl intro.html 5 False 10.0
sue advanced.html 23 False 10.0
1469974454 cheryl intro.html 6 0 0.0
sue advanced.html 24 0 0.0
1469974484 cheryl intro.html 7 0 0.0
1469974514 cheryl intro.html 8 0 0.0
1469974524 sue advanced.html 25 0 0.0
1469974544 cheryl intro.html 9 0 0.0
1469974554 sue advanced.html 26 0 0.0
1469974574 cheryl intro.html 10 0 0.0
1469974604 cheryl intro.html 11 0 0.0
1469974624 sue advanced.html 27 0 0.0
1469974634 cheryl intro.html 12 0 0.0
1469974654 sue advanced.html 28 0 5.0
1469974664 cheryl intro.html 13 0 0.0
1469974694 cheryl intro.html 14 0 0.0
1469974724 cheryl intro.html 15 0 0.0
sue advanced.html 29 0 0.0
1469974754 sue advanced.html 30 0 0.0
1469974824 sue advanced.html 31 0 0.0
1469974854 sue advanced.html 32 0 0.0
1469974924 sue advanced.html 33 0 0.0
1469977424 bob intro.html 1 True 10.0
1469977454 bob intro.html 1 0 0.0
1469977484 bob intro.html 1 0 0.0
1469977514 bob intro.html 1 0 0.0
1469977544 bob intro.html 1 0 0.0
1469977574 bob intro.html 1 0 0.0
1469977604 bob intro.html 1 0 0.0
1469977634 bob intro.html 1 0 0.0
1469977664 bob intro.html 1 0 0.0
1469977694 bob intro.html 1 0 0.0
1469977724 bob intro.html 1 0 0.0
log_data.dropna()
video playback position paused volume
time user
1469974424 cheryl intro.html 5 False 10.0
sue advanced.html 23 False 10.0
1469977424 bob intro.html 1 True 10.0
log_data.ffill()
video playback position paused volume
time user
1469974424 cheryl intro.html 5 False 10.0
sue advanced.html 23 False 10.0
1469974454 cheryl intro.html 6 False 10.0
sue advanced.html 24 False 10.0
1469974484 cheryl intro.html 7 False 10.0
1469974514 cheryl intro.html 8 False 10.0
1469974524 sue advanced.html 25 False 10.0
1469974544 cheryl intro.html 9 False 10.0
1469974554 sue advanced.html 26 False 10.0
1469974574 cheryl intro.html 10 False 10.0
1469974604 cheryl intro.html 11 False 10.0
1469974624 sue advanced.html 27 False 10.0
1469974634 cheryl intro.html 12 False 10.0
1469974654 sue advanced.html 28 False 5.0
1469974664 cheryl intro.html 13 False 5.0
1469974694 cheryl intro.html 14 False 5.0
1469974724 cheryl intro.html 15 False 5.0
sue advanced.html 29 False 5.0
1469974754 sue advanced.html 30 False 5.0
1469974824 sue advanced.html 31 False 5.0
1469974854 sue advanced.html 32 False 5.0
1469974924 sue advanced.html 33 False 5.0
1469977424 bob intro.html 1 True 10.0
1469977454 bob intro.html 1 True 10.0
1469977484 bob intro.html 1 True 10.0
1469977514 bob intro.html 1 True 10.0
1469977544 bob intro.html 1 True 10.0
1469977574 bob intro.html 1 True 10.0
1469977604 bob intro.html 1 True 10.0
1469977634 bob intro.html 1 True 10.0
1469977664 bob intro.html 1 True 10.0
1469977694 bob intro.html 1 True 10.0
1469977724 bob intro.html 1 True 10.0
log_data.bfill()
video playback position paused volume
time user
1469974424 cheryl intro.html 5 False 10.0
sue advanced.html 23 False 10.0
1469974454 cheryl intro.html 6 True 5.0
sue advanced.html 24 True 5.0
1469974484 cheryl intro.html 7 True 5.0
1469974514 cheryl intro.html 8 True 5.0
1469974524 sue advanced.html 25 True 5.0
1469974544 cheryl intro.html 9 True 5.0
1469974554 sue advanced.html 26 True 5.0
1469974574 cheryl intro.html 10 True 5.0
1469974604 cheryl intro.html 11 True 5.0
1469974624 sue advanced.html 27 True 5.0
1469974634 cheryl intro.html 12 True 5.0
1469974654 sue advanced.html 28 True 5.0
1469974664 cheryl intro.html 13 True 10.0
1469974694 cheryl intro.html 14 True 10.0
1469974724 cheryl intro.html 15 True 10.0
sue advanced.html 29 True 10.0
1469974754 sue advanced.html 30 True 10.0
1469974824 sue advanced.html 31 True 10.0
1469974854 sue advanced.html 32 True 10.0
1469974924 sue advanced.html 33 True 10.0
1469977424 bob intro.html 1 True 10.0
1469977454 bob intro.html 1 NaN NaN
1469977484 bob intro.html 1 NaN NaN
1469977514 bob intro.html 1 NaN NaN
1469977544 bob intro.html 1 NaN NaN
1469977574 bob intro.html 1 NaN NaN
1469977604 bob intro.html 1 NaN NaN
1469977634 bob intro.html 1 NaN NaN
1469977664 bob intro.html 1 NaN NaN
1469977694 bob intro.html 1 NaN NaN
1469977724 bob intro.html 1 NaN NaN