Grouped Bar Charts and 100% Stacked Bar Chart

I am going to show how to make a grouped bar chart and a 100% stacked bar chart. To start I am going to load in a dataset. I want to look at the relationship between number of customer service calls vs whether the customer churned or not.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('churn.csv')
df.head()
stateaccount lengtharea codephone numberinternational planvoice mail plannumber vmail messagestotal day minutestotal day callstotal day charge...total eve callstotal eve chargetotal night minutestotal night callstotal night chargetotal intl minutestotal intl callstotal intl chargecustomer service callschurn
0KS128415382-4657noyes25265.111045.07...9916.78244.79111.0110.032.701False
1OH107415371-7191noyes26161.612327.47...10316.62254.410311.4513.733.701False
2NJ137415358-1921nono0243.411441.38...11010.30162.61047.3212.253.290False
3OH84408375-9999yesno0299.47150.90...885.26196.9898.866.671.782False
4OK75415330-6626yesno0166.711328.34...12212.61186.91218.4110.132.733False

5 rows × 21 columns

df.groupby('customer service calls')['churn'].value_counts()
customer service calls  churn
0                       False     605
                        True       92
1                       False    1059
                        True      122
2                       False     672
                        True       87
3                       False     385
                        True       44
4                       False      90
                        True       76
5                       True       40
                        False      26
6                       True       14
                        False       8
7                       True        5
                        False       4
8                       False       1
                        True        1
9                       True        2
Name: churn, dtype: int64

This is what I want to visualize. I am first going to use a grouped bar chart.

Grouped Bar Chart

Step 1: Make the above into a dataframe

# get value counts for non_churn calls 
non_churn = df[df['churn'] == False]['customer service calls'].value_counts().sort_index()
# rename to non_churn 
non_churn.rename('Non_Churn', inplace = True)
# get value counts for churn calls 
churn = df[df['churn'] == True]['customer service calls'].value_counts().sort_index()
# rename to churn
churn.rename('Churn', inplace = True)

# combine the 2 pandas series into a dataframe
churn_df = pd.concat([non_churn, churn], axis = 1)
churn_df
Non_ChurnChurn
0605.092
11059.0122
2672.087
3385.044
490.076
526.040
68.014
74.05
81.01
9NaN2

I am going to fill in the missing values with 0, this is because the data is not missing rather no calls were made

churn_df.fillna(0, inplace = True)
churn_df
Non_ChurnChurn
0605.092
11059.0122
2672.087
3385.044
490.076
526.040
68.014
74.05
81.01
90.02

Step 2: Make Plot

with plt.style.context('fivethirtyeight'):
    churn_df.plot(kind = 'bar', figsize = (8, 6))
    plt.xticks(rotation = 0)
    plt.xlabel('Number of Customer Service Calls')
    plt.ylabel('Count of Customers')
    plt.title('Number of Customer Service Calls - Churn vs No Churn')

png

Step 3: Interpret the Graph

We see that when the number of customer service calls is between 0 and 3 there is a really high number of customers that did not churn. However, after 3 customer service calls the number of instances is very small so it is hard to determine the impact after that.

I am now going to show the same information with a 100% stacked bar chart. Note I’ll be using this great tutorial

100% Stacked Bar Chart

Step 1: Make Plot

# From raw value to percentage
totals = [i+j for i,j in zip(churn_df['Non_Churn'], churn_df['Churn'])]
greenBars = [i / j * 100 for i,j in zip(churn_df['Non_Churn'], totals)]
orangeBars = [i / j * 100 for i,j in zip(churn_df['Churn'], totals)]

with plt.style.context('fivethirtyeight'):
    # plot
    barWidth = 0.85
    names = churn_df.index
    # Create green Bars
    plt.bar(range(len(churn_df)), greenBars, color='#b5ffb9', edgecolor='white', width=barWidth, label = 'No Churn')
    # Create orange Bars
    plt.bar(range(len(churn_df)), orangeBars, bottom=greenBars, color='#f9bc86', edgecolor='white', width=barWidth, 
           label = 'Churn')

    # Custom x axis
    plt.xticks(range(len(churn_df)), names)
    plt.xlabel("Number of Customer Service Calls")

    # Add a legend
    plt.legend(loc='upper left', bbox_to_anchor=(1,1), ncol=1)

png

Step 2: Interpret Graph

In this plot we see that the percent of customers that churn after 4 customer service calls increases significantly. My recommendation would be to prioritize customers that have made 2-3 customer service calls before they reach that 4 customer service calls threshold!