Starbucks Capstone Challenge — Analytics Report

3 min readFeb 15, 2021

Project OverView:

This project contains my submission for Udacity’s Data Scientist Nanodegree Capstone project. This data set contains simulated data that mimics customer behaviour on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offers during certain weeks. Not all users receive the same offer, and that was the challenge to solve with this data set. Our task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Datasets provided for the project:

Portfolio.json: contains offer ids and metadata about each offer
profile.json: demographic data for each customer.
transcript.json: records for transactions, offers received, offers viewed, and offers complete.

Let’s see some of my findings

What is the average income of the costumers who uses the app?

The average income of the costumers who uses the app is $65924

Which are the most uses offer used by the costumers?

Here we can clearly see that discount and BOGO offers have almost the same distributions

What is the distribution of age in the merged_df?

What are the Actions to the offers that customers received?

What is the gender distributions for each age groups?

what is a gender distribution for each offer type?

Which is the offer type which is completed more than other offer types?

We have plotted various distributions in this notebooks, and from them, we came to know about many things. From which some of them are:

There are more males in the datasets and we also come to know that males use the app more than females.
We can clearly see that people from the age group 46–60 uses apps the most.
The discount offer is used most by the costumers.

our both models have performed well. DescisionTree has the best score on validation set data which is 84.9. The RandomForestClassifier performed well on train data but not that good on test(Validation) data. But this problem which we are trying to solve can also be solved with RandomForestClassifier as it doesn’t need to require high F1 score. So to predict customer response to an offer we can use any of the two offers.

Link to my GitHub Repo: https://github.com/nilaychauhan/Starbucks-Capstone-Challenge

Thank You!