Kaggle Survey Analysis

Are you one of the data science enthusiasts? If so, you probably have questions similar to what I have because there are different kinds of job positions in professional data fields. Without understanding each job title, you would not know where to start. Let’s take a look at who really are the data professionals currently working in India, then build up the list of the required skills and take away some practical tips from them.

What is your option among various data jobs?

Different job titles, different roles
Presumably, you may search only “Data Scientist” or “Machine learning engineer” in a job search engine, but you might be surprised that many DS&ML professionals in the Kaggle platform define their professions with many different job titles. Here are short descriptions of some confusing job titles.

- Machine Learning Engineer: A software engineer who leverages big data tools and programming frameworks to ensure that the raw data gathered from data pipelines are redefined as data science models that are ready to scale as needed. They’re also responsible for taking theoretical data science models and helping scale them out to production-level models that can handle terabytes of real-time data.springboard.com

- Data Scientist: A data professional who applies statistics, machine learning, and analytics approaches to solve critical business problems, and are also expected to have strong programming skills, and ability to design new algorithms, and some expertise in the domain knowledge to handle big data — cognitiveclass.ai

- Data Analyst: A data professional in their organization who can query and process data, provide reports, summarize and visualize data, but in most of the time, they are not expected to deal with analyzing big data nor to develop new algorithms” — cognitiveclass.ai

- Data Engineer: A software engineer who prepares the “big data” infrastructure to be analyzed by Data Scientists, more specifically, design, build, integrate data from various resources, and manage big data to optimize the performance of their company’s big data ecosystem. — cognitiveclass.ai

- Database Administrator(DBA)/Database Engineer: A software engineer who stores and organizes data, which includes some roles such as capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery — wikipedia.org

- Business Analyst(BA): A data professional who analyzes an organization or business domain (real or hypothetical) and documents its business, processes, or systems, assessing the business model or its integration with technology — wikipedia.org

- Other Professions working with Data: Research Scientist, Software Engineer, Statistician and Product/Project Manager.

To have a better understanding of each job title, let’s look at the below figure which compares job roles across the different job titles. Here are some findings from the comparison

  1. Most common roles of all the data professionals are Data Analysis(26.88%), Prototyping(18.35%) and Data Infrastructure(15.64%)
    2. Data scientists do more data analysis tasks than ml engineers while ml engineers are more focusing on prototyping and building machine learning service
    3. Data engineers and DBAs use a third of their working time in Building the data infrastructure
    4. Data analysts and Business analysts spend almost half of their activities in analyzing and understanding data to influence product business decisions

Meet the Data Professionals: Gender, Age and Compensation

Let’s discover the data professionals currently working in North America. Based on the three graphs below, most of the data professionals using the Kaggle platform are men(77%), 25–40 years old (49%), and expected to make 100k-150k US dollars per year. Please check each figure to see the details of the job title you want to seek.

What experience is required? Programming, Machine Learning, or Higher Education?

Based on the Kaggle Survey, most data professionals have 3–5 years of programming experience and 1–2 years of experience using machine learning. However, both machine learning engineers and data scientists seem to have more experience(3–5 years) in using machine learning than other data jobs. Also, having a master’s degree is most dominant across most data jobs except research scientists and statisticians.

What skills do you need to stand out from your resume? Professionals vs. Students

As a programmer, skillsets have a great role in matching jobs. Skillsets might differ across various job fields, but I simplified the data to compare what data professionals use on regular basis and what students learn or practice to stand out their resumes. Here is the summary of the analysis from the graph below.

  1. Programming Languages: Top language in DS & ML community is Python as expected. For other languages, students prefer to learn R over SQL, while many professionals work with SQL than R.
  2. Hosted Notebooks: Around 30% of both professionals and students don’t use hosted notebooks on regular basis, but the persons who use notebooks regularly favour working in Colab, Kaggle, and Jupyter.
    IDEs: Most dominant IDEs used by both group is Jupyter environment
    Visualization Tools: Top four tools preferred by both groups are Matplotlib, Seaborn, Ggplot, and Plotly.
  3. ML Frameworks: Top three ML frameworks (Scikit-learn, TensorFlow, and Keras) are the same for both groups, but more students use Pytorch than Xgboost while professionals similarly use them both.
  4. ML Algorithms: While both groups utilize Linear/Logistic Regression and DecisionTrees/Random Forests regularly, professionals use more Gradient Boosting Machines over Convolutional Neural Networks and Bayesian Approaches in their work.
  5. Learning Platform: Kaggle Professionals thinks that the best learning platform is Coursera(21%) followed by Kaggle Learn Courses(12%) while Students prefers to study through University Courses(21%) and Coursera(18%)
  6. Media Sources: Professionals usually share or report on data science topics via Blogs(19%), Kaggle(16%) or YouTube(14%) in order while Students would like to share in Youtube(19%) the most, followed by Kaggle(17%) and Blogs(14%)

As one of the job seekers who want to work in the data science field, I have been struggled to figure out what job title is suited for me. This article is not including all the information or gives you perfect answers but I hope that you can grasp some ideas of how different each data jobs are and what you need to focus on to land your dream job. Don’t forget to check takeaways before leaving this article!

  • A Master’s Degree might be proper the level of education for getting data jobs.
    - Python, Scikit-learn, and Jupyter notebook are the most essential skills in the data science field
    - Check out blogs, Kaggle, and youtube to communicate with working professionals

Thank you!

Link to my GitHub Repository:

--

--

--

Data Scientist at Google — Kaggle team

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Sa Dagat at Bundok

What is bitrate? What is the difference between CBR and VBR?

Visual scheme of difference between CBR and VBR

Small and Wide Data is Important and Relevant: Is the Era of Big Data Coming to an End?

Applications of Correlation

Baffled by Elasticity? use it to set the right price for your product.

Can one’s lifestyle habits predict their salary?

Statistics 101: A No BS Introduction for Dummies

LINEAR REGRESSION (In 7 Steps)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nilay Chauhan

Nilay Chauhan

Data Scientist at Google — Kaggle team

More from Medium

Machine Learning to Determine Sentiment Analysis

Stemming and Lemmatization in NLP

Mall Customer Segmentation

RPC, welcome to the new SCC: enter the new era for collective impact.