Kaggle Survey Analysis

Are you currently looking for a data science job in India?

Nilay Chauhan
5 min readFeb 14, 2021

Are you one of the data science enthusiasts? If so, you probably have questions similar to what I have because there are different kinds of job positions in professional data fields. Without understanding each job title, you would not know where to start. Let’s take a look at who really are the data professionals currently working in India, then build up the list of the required skills and take away some practical tips from them.

What is your option among various data jobs?

Different job titles, different roles
Presumably, you may search only “Data Scientist” or “Machine learning engineer” in a job search engine, but you might be surprised that many DS&ML professionals in the Kaggle platform define their professions with many different job titles. Here are short descriptions of some confusing job titles.

- Machine Learning Engineer: A software engineer who leverages big data tools and programming frameworks to ensure that the raw data gathered from data pipelines are redefined as data science models that are ready to scale as needed. They’re also responsible for taking theoretical data science models and helping scale them out to production-level models that can handle terabytes of real-time data.springboard.com

- Data Scientist: A data professional who applies statistics, machine learning, and analytics approaches to solve critical business problems, and are also expected to have strong programming skills, and ability to design new algorithms, and some expertise in the domain knowledge to handle big data — cognitiveclass.ai

- Data Analyst: A data professional in their organization who can query and process data, provide reports, summarize and visualize data, but in most of the time, they are not expected to deal with analyzing big data nor to develop new algorithms” — cognitiveclass.ai

- Data Engineer: A software engineer who prepares the “big data” infrastructure to be analyzed by Data Scientists, more specifically, design, build, integrate data from various resources, and manage big data to optimize the performance of their company’s big data ecosystem. — cognitiveclass.ai

- Database Administrator(DBA)/Database Engineer: A software engineer who stores and organizes data, which includes some roles such as capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery — wikipedia.org

- Business Analyst(BA): A data professional who analyzes an organization or business domain (real or hypothetical) and documents its business, processes, or systems, assessing the business model or its integration with technology — wikipedia.org

- Other Professions working with Data: Research Scientist, Software Engineer, Statistician and Product/Project Manager.

To have a better understanding of each job title, let’s look at the below figure which compares job roles across the different job titles. Here are some findings from the comparison

  1. Most common roles of all the data professionals are Data Analysis(26.88%), Prototyping(18.35%) and Data Infrastructure(15.64%)
    2. Data scientists do more data analysis tasks than ml engineers while ml engineers are more focusing on prototyping and building machine learning service
    3. Data engineers and DBAs use a third of their working time in Building the data infrastructure
    4. Data analysts and Business analysts spend almost half of their activities in analyzing and understanding data to influence product business decisions

Meet the Data Professionals: Gender, Age and Compensation

Let’s discover the data professionals currently working in North America. Based on the three graphs below, most of the data professionals using the Kaggle platform are men(77%), 25–40 years old (49%), and expected to make 100k-150k US dollars per year. Please check each figure to see the details of the job title you want to seek.

What experience is required? Programming, Machine Learning, or Higher Education?

Based on the Kaggle Survey, most data professionals have 3–5 years of programming experience and 1–2 years of experience using machine learning. However, both machine learning engineers and data scientists seem to have more experience(3–5 years) in using machine learning than other data jobs. Also, having a master’s degree is most dominant across most data jobs except research scientists and statisticians.

What skills do you need to stand out from your resume? Professionals vs. Students

As a programmer, skillsets have a great role in matching jobs. Skillsets might differ across various job fields, but I simplified the data to compare what data professionals use on regular basis and what students learn or practice to stand out their resumes. Here is the summary of the analysis from the graph below.

  1. Programming Languages: Top language in DS & ML community is Python as expected. For other languages, students prefer to learn R over SQL, while many professionals work with SQL than R.
  2. Hosted Notebooks: Around 30% of both professionals and students don’t use hosted notebooks on regular basis, but the persons who use notebooks regularly favour working in Colab, Kaggle, and Jupyter.
    IDEs: Most dominant IDEs used by both group is Jupyter environment
    Visualization Tools: Top four tools preferred by both groups are Matplotlib, Seaborn, Ggplot, and Plotly.
  3. ML Frameworks: Top three ML frameworks (Scikit-learn, TensorFlow, and Keras) are the same for both groups, but more students use Pytorch than Xgboost while professionals similarly use them both.
  4. ML Algorithms: While both groups utilize Linear/Logistic Regression and DecisionTrees/Random Forests regularly, professionals use more Gradient Boosting Machines over Convolutional Neural Networks and Bayesian Approaches in their work.
  5. Learning Platform: Kaggle Professionals thinks that the best learning platform is Coursera(21%) followed by Kaggle Learn Courses(12%) while Students prefers to study through University Courses(21%) and Coursera(18%)
  6. Media Sources: Professionals usually share or report on data science topics via Blogs(19%), Kaggle(16%) or YouTube(14%) in order while Students would like to share in Youtube(19%) the most, followed by Kaggle(17%) and Blogs(14%)

As one of the job seekers who want to work in the data science field, I have been struggled to figure out what job title is suited for me. This article is not including all the information or gives you perfect answers but I hope that you can grasp some ideas of how different each data jobs are and what you need to focus on to land your dream job. Don’t forget to check takeaways before leaving this article!

  • A Master’s Degree might be proper the level of education for getting data jobs.
    - Python, Scikit-learn, and Jupyter notebook are the most essential skills in the data science field
    - Check out blogs, Kaggle, and youtube to communicate with working professionals

Thank you!

Link to my GitHub Repository:

--

--