tynbl.github.io

实战案例2:麦当劳菜单营养成分分析

1. 项目描述:

过去30年,麦当劳已经认识到均衡饮食的重要性。他们花费多年时间探索和开发健康餐单。他们进行了不同研究和试验,对麦当劳的正式供应商,以及其他资源完成实验室认证。经过对麦当劳食品的严谨研究,证明它确实是一种健康生活方式的组成部分。美国营养师协会的一些合格专业人员已经认识和发现沙拉、汤和烤三明治,水果派等麦当劳食品真的非常健康。

尽管麦当劳提供大量汉堡包和炸薯条,适量食用是不会有任何问题的。但由于麦当劳食品非常美味,以至于很多孩子喜欢大量的吃,这也是导致出现问题的原因所在。吃任何东西过量都对健康不利。很多人对新的麦当劳营养指南感兴趣,因为他们需要减肥,但却经常吃这种快餐食品。但问题不仅仅是控制热量摄入这么简单。

该项目针对麦当劳菜单中的营养成分进行分析,用数据说话。

2. 数据集描述:

3. 项目任务:

4. 项目实现:

# 引入必要的包
import csv
import os
import numpy as np
import pandas as pd
# 指定数据集路径
dataset_path = '../data'
datafile = os.path.join(dataset_path, 'menu.csv')
# 读入数据
menu_data = pd.read_csv(datafile)

# 数据预览 
menu_data.head()
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

# 数据信息
menu_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 24 columns):
Category                         260 non-null object
Item                             260 non-null object
Serving Size                     260 non-null object
Calories                         260 non-null int64
Calories from Fat                260 non-null int64
Total Fat                        260 non-null float64
Total Fat (% Daily Value)        260 non-null int64
Saturated Fat                    260 non-null float64
Saturated Fat (% Daily Value)    260 non-null int64
Trans Fat                        260 non-null float64
Cholesterol                      260 non-null int64
Cholesterol (% Daily Value)      260 non-null int64
Sodium                           260 non-null int64
Sodium (% Daily Value)           260 non-null int64
Carbohydrates                    260 non-null int64
Carbohydrates (% Daily Value)    260 non-null int64
Dietary Fiber                    260 non-null int64
Dietary Fiber (% Daily Value)    260 non-null int64
Sugars                           260 non-null int64
Protein                          260 non-null int64
Vitamin A (% Daily Value)        260 non-null int64
Vitamin C (% Daily Value)        260 non-null int64
Calcium (% Daily Value)          260 non-null int64
Iron (% Daily Value)             260 non-null int64
dtypes: float64(3), int64(18), object(3)
memory usage: 48.8+ KB
menu_data.describe()
Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat Cholesterol Cholesterol (% Daily Value) Sodium ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
count 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 ... 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000
mean 368.269231 127.096154 14.165385 21.815385 6.007692 29.965385 0.203846 54.942308 18.392308 495.750000 ... 47.346154 15.780769 1.630769 6.530769 29.423077 13.338462 13.426923 8.534615 20.973077 7.734615
std 240.269886 127.875914 14.205998 21.885199 5.321873 26.639209 0.429133 87.269257 29.091653 577.026323 ... 28.252232 9.419544 1.567717 6.307057 28.679797 11.426146 24.366381 26.345542 17.019953 8.723263
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 210.000000 20.000000 2.375000 3.750000 1.000000 4.750000 0.000000 5.000000 2.000000 107.500000 ... 30.000000 10.000000 0.000000 0.000000 5.750000 4.000000 2.000000 0.000000 6.000000 0.000000
50% 340.000000 100.000000 11.000000 17.000000 5.000000 24.000000 0.000000 35.000000 11.000000 190.000000 ... 44.000000 15.000000 1.000000 5.000000 17.500000 12.000000 8.000000 0.000000 20.000000 4.000000
75% 500.000000 200.000000 22.250000 35.000000 10.000000 48.000000 0.000000 65.000000 21.250000 865.000000 ... 60.000000 20.000000 3.000000 10.000000 48.000000 19.000000 15.000000 4.000000 30.000000 15.000000
max 1880.000000 1060.000000 118.000000 182.000000 20.000000 102.000000 2.500000 575.000000 192.000000 3600.000000 ... 141.000000 47.000000 7.000000 28.000000 128.000000 87.000000 170.000000 240.000000 70.000000 40.000000

8 rows × 21 columns

4.1 按单品类型分析查看数据

used_cols = ['Calories', 'Calories from Fat', 'Total Fat', 'Cholesterol', 'Sugars']
# 营养成分最高的单品
max_idxs = [menu_data[col].argmax() for col in used_cols]
for col, max_idx in zip(used_cols, max_idxs):
    print('{} 最高的单品:{}'.format(col, menu_data.iloc[max_idx]['Item']))
Calories 最高的单品:Chicken McNuggets (40 piece)
Calories from Fat 最高的单品:Chicken McNuggets (40 piece)
Total Fat 最高的单品:Chicken McNuggets (40 piece)
Cholesterol 最高的单品:Big Breakfast with Hotcakes (Regular Biscuit)
Sugars 最高的单品:McFlurry with M&M’s Candies (Medium)
# 营养成分最低的单品
min_idxs = [menu_data[col].argmin() for col in used_cols]
for col, min_idx in zip(used_cols, min_idxs):
    print('{} 最低的单品:{}'.format(col, menu_data.iloc[min_idx]['Item']))
Calories 最低的单品:Diet Coke (Small)
Calories from Fat 最低的单品:Side Salad
Total Fat 最低的单品:Side Salad
Cholesterol 最低的单品:Hash Brown
Sugars 最低的单品:Hash Brown

4.2 按菜单类型分析查看数据

# 菜单类型的单品数目分布
cat_grouped = menu_data.groupby('Category')
print('菜单类型的单品数目:')
print(cat_grouped.size().sort_values(ascending=False))
菜单类型的单品数目:
Category
Coffee & Tea          95
Breakfast             42
Smoothies & Shakes    28
Chicken & Fish        27
Beverages             27
Beef & Pork           15
Snacks & Sides        13
Desserts               7
Salads                 6
dtype: int64
# 菜单类型的营养成分分布
print('菜单类型的营养成分分布:')

used_cols = ['Calories', 'Calories from Fat', 'Total Fat', 'Cholesterol', 'Sugars']
print(cat_grouped[used_cols].mean())
菜单类型的营养成分分布:
                      Calories  Calories from Fat  Total Fat  Cholesterol  \
Category                                                                    
Beef & Pork         494.000000         224.666667  24.866667    87.333333   
Beverages           113.703704           0.740741   0.092593     0.555556   
Breakfast           526.666667         248.928571  27.690476   152.857143   
Chicken & Fish      552.962963         242.222222  26.962963    75.370370   
Coffee & Tea        283.894737          71.105263   8.021053    27.263158   
Desserts            222.142857          64.285714   7.357143    15.000000   
Salads              270.000000         108.333333  11.750000    51.666667   
Smoothies & Shakes  531.428571         127.678571  14.125000    45.000000   
Snacks & Sides      245.769231          94.615385  10.538462    18.461538   

                       Sugars  
Category                       
Beef & Pork          8.800000  
Beverages           27.851852  
Breakfast            8.261905  
Chicken & Fish       7.333333  
Coffee & Tea        39.610526  
Desserts            26.142857  
Salads               6.833333  
Smoothies & Shakes  77.892857  
Snacks & Sides       4.076923  
# 营养成分最高的菜单类型
max_cats = [cat_grouped[col].mean().argmax() for col in used_cols]
for col, cat in zip(used_cols, max_cats):
    print('{} 最高的菜单类型:{}'.format(col, cat))
Calories 最高的菜单类型:Chicken & Fish
Calories from Fat 最高的菜单类型:Breakfast
Total Fat 最高的菜单类型:Breakfast
Cholesterol 最高的菜单类型:Breakfast
Sugars 最高的菜单类型:Smoothies & Shakes
# 营养成分最低的菜单类型
min_cats = [cat_grouped[col].mean().argmin() for col in used_cols]
for col, cat in zip(used_cols, min_cats):
    print('{} 最低的菜单类型:{}'.format(col, cat))
Calories 最低的菜单类型:Beverages
Calories from Fat 最低的菜单类型:Beverages
Total Fat 最低的菜单类型:Beverages
Cholesterol 最低的菜单类型:Beverages
Sugars 最低的菜单类型:Snacks & Sides

4.3 查看分析单品及菜单的份量

menu_data['Serving Size'].head()
0    4.8 oz (136 g)
1    4.8 oz (135 g)
2    3.9 oz (111 g)
3    5.7 oz (161 g)
4    5.7 oz (161 g)
Name: Serving Size, dtype: object
# 过滤数据,只保留包含 'g'的单品
sel_menu_data = menu_data[menu_data['Serving Size'].str.contains('g')].copy()

def proc_size_str(size_str):
    """
        处理serving size字符串,返回g
    """
    start_idx = size_str.index('(') + 1
    end_idx = size_str.index('g')
    size_val = size_str[start_idx : end_idx]
    return float(size_val)

sel_menu_data['Size'] = sel_menu_data['Serving Size'].apply(proc_size_str)
sel_menu_data.head()
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Size
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 10 4 17 3 17 10 0 25 15 136.0
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 10 4 17 3 18 6 0 25 8 135.0
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 10 4 17 2 14 8 0 25 10 111.0
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 10 4 17 2 21 15 0 30 15 161.0
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 10 4 17 2 21 6 0 25 10 161.0

5 rows × 25 columns

# 份量最多的单品
max_idx = sel_menu_data['Size'].argmax()
print('份量最多的单品:{},{}g'.format(sel_menu_data.iloc[max_idx]['Item'], sel_menu_data['Size'].max()))

min_idx = sel_menu_data['Size'].argmin()
print('份量最少的单品:{},{}g'.format(sel_menu_data.iloc[min_idx]['Item'], sel_menu_data['Size'].min()))
份量最多的单品:Chicken McNuggets (40 piece),646.0g
份量最少的单品:Kids Ice Cream Cone,29.0g
sel_cat_grouped = sel_menu_data.groupby('Category')

print('份量最多的类别:{},{}g'.format(sel_cat_grouped['Size'].mean().argmax(), 
                             sel_cat_grouped['Size'].mean().max()))

print('份量最少的类别:{},{}g'.format(sel_cat_grouped['Size'].mean().argmin(), 
                             sel_cat_grouped['Size'].mean().min()))
份量最多的类别:Smoothies & Shakes,304.75g
份量最少的类别:Desserts,101.57142857142857g

5. 项目总结