import pandas as pdExercise 5: Pandas
In this exercise, we move from our simplified “teaching” dataset to a broader download from the Penn World Table (PWT). Download the data pwt_data_ex5.csv and save it in a folder data. This version contains more countries and a wider range of macroeconomic variables:
| Variable | Full Name / Definition |
|---|---|
| iso3 | 3-letter ISO Country Code |
| Country | Full Country Name |
| year | Observation Year |
| pop | Population (in millions) |
| rgdpna | Real GDP at constant 2021 national prices (in mil. 2021USD) |
| rnna | Capital stock at constant 2021 national prices (in mil. 2021USD) |
| emp | Number of persons engaged (in millions) |
| avh | Average annual hours worked by persons engaged |
| hc | Human capital index (based on years of schooling) |
| ctfp | TFP level at current PPPs (USA = 1) |
| labsh | Share of labor compensation in GDP |
| pl_con | Price level of household consumption |
Import Packages
Exercise 1: Data Loading and Initial Inspection
In this exercise, you will practice loading a comprehensive macroeconomic dataset into a DataFrame and performing an initial inspection to understand its dimensionality, variable types, and the unique entities it contains.
Your Tasks:
- Load the data: Read the file
data/pwt_data_ex5.csvinto a DataFrame nameddf. - Size Check: How many observations (rows) and variables (columns) does this dataset have?
- Variable Types: Use a method to see which columns are categorical (strings) and which are numeric.
- Country Count: How many unique countries are represented in this dataset?
- Summary Statistics: Display the mean and standard deviation for the numeric variables.
# 1. Load the data
df = pd.read_csv("data/pwt_data_ex5.csv")
df.head()| iso3 | Country | year | avh | ctfp | emp | hc | labsh | pl_con | pop | rgdpna | rnna | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABW | Aruba | 2000 | NaN | NaN | 0.041918 | NaN | 0.645106 | 0.503648 | 0.088761 | 3982.696289 | 11437.366211 |
| 1 | ABW | Aruba | 2001 | NaN | NaN | 0.042579 | NaN | 0.645106 | 0.518854 | 0.090305 | 3918.971436 | 11965.093750 |
| 2 | ABW | Aruba | 2002 | NaN | NaN | 0.043016 | NaN | 0.645106 | 0.532409 | 0.091379 | 3923.781006 | 12594.469727 |
| 3 | ABW | Aruba | 2003 | NaN | NaN | 0.043385 | NaN | 0.645106 | 0.538941 | 0.092310 | 3944.353027 | 13318.708008 |
| 4 | ABW | Aruba | 2004 | NaN | NaN | 0.043739 | NaN | 0.645106 | 0.552747 | 0.093213 | 4243.611328 | 14101.028320 |
You’ll notice immediately the NaN entries in the dataframe. In pandas, NaN stands for “Not a Number” and is commonly used to represent missing or undefined data in a Series or DataFrame.
For this exercise we’re going to ignore them. Handling missing data is something we’ll cover in week 6.
# 2. Size Check
print(f"Dataset Shape: {df.shape}")
# df.shape returns (rows, columns)Dataset Shape: (4422, 12)
# 3. Variable Types and Non-Null counts
# .info() is perfect for seeing Dtypes and identifying missing data at a glance
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4422 entries, 0 to 4421
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 iso3 4422 non-null object
1 Country 4422 non-null object
2 year 4422 non-null int64
3 avh 2812 non-null float64
4 ctfp 2880 non-null float64
5 emp 4302 non-null float64
6 hc 3480 non-null float64
7 labsh 3350 non-null float64
8 pl_con 4422 non-null float64
9 pop 4422 non-null float64
10 rgdpna 4422 non-null float64
11 rnna 4320 non-null float64
dtypes: float64(9), int64(1), object(2)
memory usage: 414.7+ KB
# 4. Country Count
# We use .nunique() on the 'Country' column to count unique entries
num_countries = df['Country'].nunique()
print(f"There are {num_countries} countries in this dataset.")There are 185 countries in this dataset.
# 5. Summary Statistics
# .describe() gives us the mean, std, and quartiles for numeric columns
df.describe()| year | avh | ctfp | emp | hc | labsh | pl_con | pop | rgdpna | rnna | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 4422.000000 | 2812.000000 | 2880.000000 | 4302.000000 | 3480.000000 | 3350.000000 | 4422.000000 | 4422.000000 | 4.422000e+03 | 4.320000e+03 |
| mean | 2011.535957 | 1955.190185 | 0.661380 | 16.943444 | 2.545427 | 0.503176 | 0.528296 | 38.285502 | 6.300825e+05 | 2.883834e+06 |
| std | 6.912838 | 313.353559 | 0.262749 | 68.968464 | 0.692214 | 0.122610 | 1.265803 | 141.094358 | 2.181730e+06 | 9.505130e+06 |
| min | 2000.000000 | 1313.570000 | 0.051811 | 0.001760 | 1.069451 | 0.084364 | 0.065918 | 0.004420 | 6.765967e+01 | 7.886124e+02 |
| 25% | 2006.000000 | 1694.320000 | 0.458319 | 0.829782 | 1.976296 | 0.433389 | 0.305083 | 2.041515 | 1.942828e+04 | 6.750847e+04 |
| 50% | 2012.000000 | 1947.600000 | 0.646390 | 3.390750 | 2.631051 | 0.516634 | 0.410670 | 8.092902 | 6.798322e+04 | 2.887282e+05 |
| 75% | 2018.000000 | 2178.300000 | 0.844167 | 10.190027 | 3.113224 | 0.591901 | 0.584906 | 26.208164 | 3.798980e+05 | 1.654450e+06 |
| max | 2023.000000 | 2706.890000 | 2.382832 | 774.418213 | 3.986023 | 0.911995 | 56.723629 | 1438.069596 | 3.104758e+07 | 1.710951e+08 |
Exercise 2: Basic Filtering
In this exercise, you will practice extracting specific subsets of data from your main DataFrame df.
Your Tasks:
- The Recent Slice: Create a new DataFrame called
df_2023that contains only observations for the year 2023. - The Country Focus: Create a new DataFrame called
df_switzerlandthat contains all years of data for Switzerland. - High-Population Economies: Filter the dataset to find all observations where the population (
'pop') was greater than 1,000 million (1 billion), which countries remain? - Specific Comparison: Create a DataFrame called
df_comparisonthat contains only the data for Germany and France for the years 2000 to 2010 (inclusive). Hint: You will need to use the&operator and parentheses for multiple conditions.
# 1. The Recent Slice
df_2023 = df[df['year'] == 2023]# 2. The Country Focus
df_switzerland = df[df['Country'] == 'Switzerland']# 3. High-Population Economies
high_pop = df[df['pop'] > 1000]
high_pop['Country'].unique()array(['China', 'India'], dtype=object)
# 4. Specific Comparison
# We use .isin() for the countries and logical operators for the years
countries = ['Germany', 'France']
mask = (df['Country'].isin(countries)) & (df['year'] >= 2000) & (df['year'] <= 2010)
df_comparison = df[mask]
print(f"Size of comparison group: {len(df_comparison)}")Size of comparison group: 22
Exercise 3: Creating New Variables
In this exercise, you will create derived variables to help analyze productivity and living standards across different countries.
Your Tasks:
GDP per Capita: Create a new column called
gdp_pcby dividing Real GDP (rgdpna) by Population (pop).Capital Intensity: Create a column called
k_laborwhich represents the amount of Capital Stock (rnna) available per person engaged (emp).Labor Productivity: Create a column called
labor_prodby dividing Real GDP (rgdpna) by the total number of hours worked (empxavh).Note:
avhhas many missing values, which will result inNaNfor those specific rows in your new column. We’ll learn how to deal with missing values next week.Display Results: Show the first 5 rows of the newly created variables, country and year.
# 1. GDP per Capita
# Formula: Real GDP / Population
df['gdp_pc'] = df['rgdpna'] / df['pop']# 2. Capital Intensity (Capital per worker)
# Formula: Capital Stock / Employment
df['k_labor'] = df['rnna'] / df['emp']# 3. Labor Productivity (Output per hour)
# We multiply employment by average hours to get total labor hours
df['labor_prod'] = df['rgdpna'] / (df['emp'] * df['avh'])# Display the first few rows to verify the new columns
df[['Country', 'year', 'gdp_pc', 'k_labor', 'labor_prod']].head()| Country | year | gdp_pc | k_labor | labor_prod | |
|---|---|---|---|---|---|
| 0 | Aruba | 2000 | 44869.889806 | 272850.962900 | NaN |
| 1 | Aruba | 2001 | 43397.059250 | 281009.204154 | NaN |
| 2 | Aruba | 2002 | 42939.636086 | 292782.673191 | NaN |
| 3 | Aruba | 2003 | 42729.422894 | 306988.439240 | NaN |
| 4 | Aruba | 2004 | 45525.960200 | 322389.519304 | NaN |
Exercise 4: Aggregation and Grouping
In this exercise, you will practice summarizing your data. Instead of looking at individual rows, we want to understand broader trends by country or by year.
Your Tasks:
- Global Trends: Calculate the total world population and the average TFP level for every year in the dataset.
- Country Profiles: For each country, find their maximum GDP per capita (
'gdp_pc') and their average labor share ('labsh') across all available years. - Productivity Trend: Group the data by year and calculate the mean Labor Productivity (
'labor_prod'). - Summary Table: Create a new DataFrame called
country_statsthat shows the mean, min, and max of the Human Capital index ('hc') for every country.
# We group by 'year' and sum population, then average TFP
global_trends = df.groupby('year').agg({
'pop': 'sum',
'ctfp': 'mean'
}).reset_index()
global_trends.head()| year | pop | ctfp | |
|---|---|---|---|
| 0 | 2000 | 6085.879992 | 0.663150 |
| 1 | 2001 | 6168.092024 | 0.664814 |
| 2 | 2002 | 6248.875966 | 0.672702 |
| 3 | 2003 | 6329.244726 | 0.662922 |
| 4 | 2004 | 6410.522290 | 0.663620 |
# Find the peak wealth and typical labor share for each nation
country_profiles = df.groupby('Country').agg({
'gdp_pc': 'max',
'labsh': 'mean'
}).reset_index()
country_profiles.head()| Country | gdp_pc | labsh | |
|---|---|---|---|
| 0 | Albania | 16316.157654 | 0.806560 |
| 1 | Algeria | 14541.418972 | NaN |
| 2 | Angola | 8631.597586 | 0.284984 |
| 3 | Anguilla | 31024.327373 | NaN |
| 4 | Antigua and Barbuda | 32240.787743 | NaN |
# Labor Productivity over time
productivity_trend = df.groupby('year')['labor_prod'].mean()
productivity_trend.head()
# Optional: Quick visualization to check the trend
# productivity_trend.plot(title="Global Labor Productivity Over Time")year
2000 33.359983
2001 33.657953
2002 34.408001
2003 35.299476
2004 36.264415
Name: labor_prod, dtype: float64
# Using .agg() with a list of functions for a single column
country_stats = df.groupby('Country')['hc'].agg(['mean', 'min', 'max']).reset_index()
country_stats.head()| Country | mean | min | max | |
|---|---|---|---|---|
| 0 | Albania | 2.929273 | 2.801147 | 3.018473 |
| 1 | Algeria | 2.145182 | 1.885080 | 2.508918 |
| 2 | Angola | 1.419252 | 1.296941 | 1.544955 |
| 3 | Anguilla | NaN | NaN | NaN |
| 4 | Antigua and Barbuda | NaN | NaN | NaN |
Exercise 5: Visualization
In this exercise, you will use matplotlib to create figures that tell a story about global development and productivity.
Your Tasks:
- Import matplotlib
- The Rise of Human Capital: Create a line plot showing the average human capital (hc) for all countries over time. Requirement: Add a title, label your axes, and include a grid.
- Productivity vs. Wealth: Create a scatter plot for the year 2023 where the x-axis is hc (Human Capital) and the y-axis is gdp_pc (GDP per capita).
Create the same figure again but useplt.yscale('log'). - The TFP Frontier: Use a horizontal bar chart (plt.barh) to show the ctfp (Total Factor Productivity) for a selection of countries in 2019: United States, China, India, Germany, Switzerland, Brazil, and Nigeria.
- Distribution of Labor Shares: Create a histogram of the labsh (Labor Share) variable for the entire dataset. Use 30 bins to see the shape of the distribution.
import matplotlib.pyplot as plt# 1. Calculate the average human capital for all countries by year
hc_trend = df.groupby('year')['hc'].mean()
hc_trend = hc_trend.reset_index()
# 2. Create the figure
# Initialize figure and set a figsize
plt.figure(figsize=(8, 5))
# plot year on x-axis and 'hc' on y axis
plt.plot(hc_trend['year'], hc_trend['hc'])
# Set a title and labels
plt.title("Global Average Human Capital Index (2000–2023)")
plt.xlabel("Year")
plt.ylabel("Average Human Capital Index")
# Add a grid
plt.grid(True)
# Show the figure
plt.show()# Filter your data for 2023
df_2023 = df[df['year'] == 2023]
# Create the scatterplot
plt.figure(figsize=(8, 6))
plt.scatter(df_2023['hc'], df_2023['gdp_pc'])
plt.title("Human Capital vs. GDP per Capita (2023)")
plt.xlabel("Human Capital Index")
plt.ylabel("GDP per Capita (USD)")
plt.show()# Create the same plot with a log scale for the y-axis
plt.figure(figsize=(8, 6))
plt.scatter(df_2023['hc'], df_2023['gdp_pc'])
plt.yscale('log') # Log scale for GDP is standard in macroeconomics
plt.title("Human Capital vs. GDP per Capita (2023)")
plt.xlabel("Human Capital Index")
plt.ylabel("GDP per Capita (USD, Log Scale)")
plt.show()# Filter the 2023 data from previously for the countries, and sort values
selection = ['United States', 'China', 'India', 'Germany','Switzerland', 'Brazil', 'Nigeria']
df_sel = df_2023[df_2023['Country'].isin(selection)]
df_sel = df_sel.sort_values('ctfp')
# Create horizontal barchart
plt.figure(figsize=(8, 5))
plt.barh(df_sel['Country'], df_sel['ctfp'])
plt.title("Total Factor Productivity Levels in 2023 (USA = 1)")
plt.xlabel("TFP Level (Relative to USA)")
plt.show()plt.figure(figsize=(8, 5))
plt.hist(df['labsh'].dropna(), bins=30, edgecolor='white')
plt.title("Distribution of Labor Shares Across All Observations")
plt.xlabel("Labor Share of GDP")
plt.ylabel("Frequency")
plt.show()Exercise 6: Growth Accounting - Self-Study
In this independent deep dive, you will apply the ‘Solow Residual’ method to decompose GDP growth into its fundamental drivers—capital, labor, and technology—transforming a core macroeconomic theory into a practical data analysis.
In this exercise, we assume a Cobb-Douglas production function:
\[Y_t = A_t K_t^{\alpha}L_t^{1-\alpha}\]
Where \(Y\) is output, \(K\) is capital, \(L\) is labor, and \(A\) is technology (TFP). Our goal is to calculate the growth rate of \(A\). We want to understand how much of a country’s GDP growth comes from the three components (i) labor, (ii) capital and (iii) technological progress.
1. Preparation
- Create a new dataframe
ga_dfthat includes only the United States and Switzerland - Sort values by country and year (important for growth calculations later)
# 1. Filter and Sort
countries = ['United States', 'Switzerland']
ga_df = df[df['Country'].isin(countries)].copy()
ga_df = ga_df.sort_values(['Country', 'year'])
ga_df.head()| iso3 | Country | year | avh | ctfp | emp | hc | labsh | pl_con | pop | rgdpna | rnna | gdp_pc | k_labor | labor_prod | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 744 | CHE | Switzerland | 2000 | 1712.74 | 0.914894 | 3.944163 | 3.528448 | 0.654686 | 0.774927 | 7.184007 | 529301.6875 | 2352489.00 | 73677.780033 | 596448.279214 | 78.353252 |
| 745 | CHE | Switzerland | 2001 | 1672.85 | 0.933410 | 4.013566 | 3.538948 | 0.670151 | 0.778998 | 7.226391 | 537641.6875 | 2405543.00 | 74399.750512 | 599352.970383 | 80.076572 |
| 746 | CHE | Switzerland | 2002 | 1651.26 | 0.960231 | 4.042648 | 3.549480 | 0.684949 | 0.837000 | 7.278752 | 537248.0625 | 2456137.25 | 73810.464005 | 607556.497216 | 80.481013 |
| 747 | CHE | Switzerland | 2003 | 1664.54 | 0.929821 | 4.029146 | 3.560044 | 0.677805 | 1.008451 | 7.333447 | 537074.0000 | 2503626.50 | 73236.228475 | 621378.990849 | 80.080526 |
| 748 | CHE | Switzerland | 2004 | 1694.95 | 0.933693 | 4.041221 | 3.570638 | 0.666347 | 1.112374 | 7.384194 | 551584.1250 | 2557653.75 | 74697.946045 | 632891.311870 | 80.527136 |
2. Calculate Growth Rates
- Create new variables:
g_y: GDP growth (growth rate ofrgdpna)g_l: Labor growth (growth rate ofemp)g_k: Capital growth (growth rate ofrnna)
You can calculate growth rates with .pct_change(). It will calculate the percentage change of a value and the previous value in your dataframe. This is why sorting is important, we want to calculate the change from one year to the next.
Important: We don’t want to accidentally compare the last value of switzerland to the first value of the United States. To avoid this you should group by ‘Country’ before you calculate the growth rates.
ga_df = ga_df.sort_values(['Country', 'year'])
ga_df['g_y'] = ga_df.groupby('Country')['rgdpna'].pct_change()
ga_df['g_k'] = ga_df.groupby('Country')['rnna'].pct_change()
ga_df['g_l'] = ga_df.groupby('Country')['emp'].pct_change()
ga_df.head()| iso3 | Country | year | avh | ctfp | emp | hc | labsh | pl_con | pop | rgdpna | rnna | gdp_pc | k_labor | labor_prod | g_y | g_k | g_l | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 744 | CHE | Switzerland | 2000 | 1712.74 | 0.914894 | 3.944163 | 3.528448 | 0.654686 | 0.774927 | 7.184007 | 529301.6875 | 2352489.00 | 73677.780033 | 596448.279214 | 78.353252 | NaN | NaN | NaN |
| 745 | CHE | Switzerland | 2001 | 1672.85 | 0.933410 | 4.013566 | 3.538948 | 0.670151 | 0.778998 | 7.226391 | 537641.6875 | 2405543.00 | 74399.750512 | 599352.970383 | 80.076572 | 0.015757 | 0.022552 | 0.017597 |
| 746 | CHE | Switzerland | 2002 | 1651.26 | 0.960231 | 4.042648 | 3.549480 | 0.684949 | 0.837000 | 7.278752 | 537248.0625 | 2456137.25 | 73810.464005 | 607556.497216 | 80.481013 | -0.000732 | 0.021032 | 0.007246 |
| 747 | CHE | Switzerland | 2003 | 1664.54 | 0.929821 | 4.029146 | 3.560044 | 0.677805 | 1.008451 | 7.333447 | 537074.0000 | 2503626.50 | 73236.228475 | 621378.990849 | 80.080526 | -0.000324 | 0.019335 | -0.003340 |
| 748 | CHE | Switzerland | 2004 | 1694.95 | 0.933693 | 4.041221 | 3.570638 | 0.666347 | 1.112374 | 7.384194 | 551584.1250 | 2557653.75 | 74697.946045 | 632891.311870 | 80.527136 | 0.027017 | 0.021580 | 0.002997 |
4. The Solow Residual:
We want to calculate TFP growth (i.e. the growth of \(A\) in the production function).
- Calculate TFP growth using the following formula:
\[g_A = g_Y - [\alpha g_K+(1−\alpha)g_L]\]
- \(g_L\): growth of labour
g_l - \(g_K\): growth of capital
g_k - \(g_Y\): growth of GDP
g_y - \(\alpha\): capital share
alpha
If you want to derive this formula: apply \(ln\) to the production function, then use the fact that the change in the natural log of a variable is approximately equal to its growth rate (g)
# 4. Calculate TFP Growth
ga_df['g_tfp'] = ga_df['g_y'] - (ga_df['alpha'] * ga_df['g_k'] + (1 - ga_df['alpha']) * ga_df['g_l'])
ga_df.head()| iso3 | Country | year | avh | ctfp | emp | hc | labsh | pl_con | pop | ... | rnna | gdp_pc | k_labor | labor_prod | g_y | g_k | g_l | cap_share_annual | alpha | g_tfp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 744 | CHE | Switzerland | 2000 | 1712.74 | 0.914894 | 3.944163 | 3.528448 | 0.654686 | 0.774927 | 7.184007 | ... | 2352489.00 | 73677.780033 | 596448.279214 | 78.353252 | NaN | NaN | NaN | 0.345314 | 0.341327 | NaN |
| 745 | CHE | Switzerland | 2001 | 1672.85 | 0.933410 | 4.013566 | 3.538948 | 0.670151 | 0.778998 | 7.226391 | ... | 2405543.00 | 74399.750512 | 599352.970383 | 80.076572 | 0.015757 | 0.022552 | 0.017597 | 0.329849 | 0.341327 | -0.003532 |
| 746 | CHE | Switzerland | 2002 | 1651.26 | 0.960231 | 4.042648 | 3.549480 | 0.684949 | 0.837000 | 7.278752 | ... | 2456137.25 | 73810.464005 | 607556.497216 | 80.481013 | -0.000732 | 0.021032 | 0.007246 | 0.315051 | 0.341327 | -0.012684 |
| 747 | CHE | Switzerland | 2003 | 1664.54 | 0.929821 | 4.029146 | 3.560044 | 0.677805 | 1.008451 | 7.333447 | ... | 2503626.50 | 73236.228475 | 621378.990849 | 80.080526 | -0.000324 | 0.019335 | -0.003340 | 0.322195 | 0.341327 | -0.004724 |
| 748 | CHE | Switzerland | 2004 | 1694.95 | 0.933693 | 4.041221 | 3.570638 | 0.666347 | 1.112374 | 7.384194 | ... | 2557653.75 | 74697.946045 | 632891.311870 | 80.527136 | 0.027017 | 0.021580 | 0.002997 | 0.333653 | 0.341327 | 0.017677 |
5 rows × 21 columns
5. Summarize and Plot
First we want to create a summary table
- Create a table
summarywith the mean of the growth variables and alpha['g_y', 'g_k', 'g_l', 'alpha', 'g_tfp']by country - The raw growth rates of Capital (
g_k) and Labor (g_l) don’t tell the whole story. We must multiply them by their respective “shares” in the economy to see how much they actually contributed to total GDP growth:- Create a variable
contrib_k: The contribution of capital which is \(\alpha g_K\) - Create a variable
contrib_l: The contribution of labor which is \((1-\alpha)g_L\)
- Create a variable
# 1. Average the components
summary = ga_df.groupby('Country')[['g_y', 'g_k', 'g_l', 'alpha', 'g_tfp']].mean()
# 2. Calculate the final contributions
summary['contrib_k'] = summary['alpha'] * summary['g_k']
summary['contrib_l'] = (1 - summary['alpha']) * summary['g_l']
# 3. Show the summary table
summary = summary.reset_index()
summary| Country | g_y | g_k | g_l | alpha | g_tfp | contrib_k | contrib_l | |
|---|---|---|---|---|---|---|---|---|
| 0 | Switzerland | 0.018271 | 0.020339 | 0.012318 | 0.341327 | 0.003215 | 0.006942 | 0.008113 |
| 1 | United States | 0.021028 | 0.019545 | 0.008397 | 0.400359 | 0.008168 | 0.007825 | 0.005035 |
Then we create a stacked barchart to visualize our results.
The code below creates a figure that decomposes average annual GDP growth into the specific contributions of capital accumulation, labor input, and Total Factor Productivity (TFP) to illustrate the underlying drivers of economic growth for each country.
We haven’t covered “stacked” charts in class, deduce how this works by looking at the three plt.bar() calls.
# 1. Setup the data from the columns
countries = summary['Country']
tfp_part = summary['g_tfp']
k_part = summary['contrib_k']
l_part = summary['contrib_l']
# 2. Create the figure
plt.figure(figsize=(8, 4.5))
# Plot Layer 1: TFP (The base)
plt.bar(countries, tfp_part, label='TFP Growth')
# Plot Layer 2: Capital (Stacked on top of TFP)
plt.bar(countries, k_part, bottom=tfp_part, label='Capital Contribution')
# Plot Layer 3: Labor (Stacked on top of TFP + Capital)
plt.bar(countries, l_part, bottom=tfp_part + k_part, label='Labor Contribution')
# 3. Add styling
plt.ylabel("Average Annual Growth Rate")
plt.title("Growth Accounting Decomposition")
plt.legend()
plt.tight_layout()
plt.show()We call plt.bar() three separate times. Each call tells Matplotlib to draw a set of bars on the same figure.
- The first call draws the TFP bars.
- The second call draws the Capital bars.
- The third call draws the Labor bars.
The bottom Argument: This is the secret to “stacking.” By default, Matplotlib starts every bar at zero.
- In the second call, we set
bottom=tfp_part. This tells Python: “Don’t start the Capital bars at zero; start them at the height where the TFP bars ended.” - In the third call, we set
bottom=tfp_part + k_part. This stacks the Labor bars on top of both the TFP and Capital layers combined.
Labels and Legend:
- Inside each
plt.bar()function, we provide a label (e.g., ‘TFP Growth’). - However, these labels won’t show up on the chart by themselves. We must call
plt.legend()at the end. This command looks at all the labels we’ve defined and creates the “Key” or “Legend” in the corner of the chart so the reader knows which color represents which economic component.