Starbucks Capstone Challenge — Analytics Report

Photo by TR on Unsplash

Project OverView:

This project contains my submission for Udacity’s Data Scientist Nanodegree Capstone project. This data set contains simulated data that mimics customer behaviour on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offers during certain weeks. Not all users receive the same offer, and that was the challenge to solve with this data set. Our task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Datasets provided for the project:

  1. Portfolio.json: contains offer ids and metadata about each offer
  2. profile.json: demographic data for each customer.
  3. transcript.json: records for transactions, offers received, offers viewed, and offers complete.

Let’s see some of my findings

What is the average income of the costumers who uses the app?

The average income of the costumers who uses the app is $65924

Which are the most uses offer used by the costumers?

Here we can clearly see that discount and BOGO offers have almost the same distributions

What is the distribution of age in the merged_df?

What are the Actions to the offers that customers received?

What is the gender distributions for each age groups?

what is a gender distribution for each offer type?

Which is the offer type which is completed more than other offer types?

We have plotted various distributions in this notebooks, and from them, we came to know about many things. From which some of them are:

  • There are more males in the datasets and we also come to know that males use the app more than females.
  • We can clearly see that people from the age group 46–60 uses apps the most.
  • The discount offer is used most by the costumers.

our both models have performed well. DescisionTree has the best score on validation set data which is 84.9. The RandomForestClassifier performed well on train data but not that good on test(Validation) data. But this problem which we are trying to solve can also be solved with RandomForestClassifier as it doesn’t need to require high F1 score. So to predict customer response to an offer we can use any of the two offers.

Link to my GitHub Repo:

Thank You!




Data Scientist at Google — Kaggle team

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Teaching the data science process

Data Scientist Vs Web Developer: Which is Better

Building an XGBoost Model to Predict Video Popularity

Creating bowler’s pitch map in Python(Cricket)

Prostitution in the US: Cost

I finally experienced the full-cycle of business analytics!

7 Best Data Science Courses for Working Professionals in 2021

How We Used Machine Learning to Predict Neighborhood Change

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nilay Chauhan

Nilay Chauhan

Data Scientist at Google — Kaggle team

More from Medium

Conceptualizing Association Relationships in Data Modeling

A Remote Island! A Group of Resenting People! A Cruel Murder!

Predicting League of Legends Victors by Early Game Statistics