Return to Glossary

Data Scientist

A person with formal training in modern statistical methods develops AI models, especially machine learning models, that help predict, recommend, or categorize outcomes. Typically they use languages such as R or Python.

Added Perspectives

What they lead: The data scientist serves as quarterback of the ML lifecycle from start to finish. They take the guiding objectives and questions from the business owner, study available datasets, then devise an ML model to answer those questions. They lead the data and feature engineering phase jointly with the data engineer that best understands the data. They lead the model development phase individually, but partner with the ML engineer to jointly lead the tricky effort of bringing a model into production. What they learn: The data scientist must learn how to explain statistical principles and ML techniques—and their business implications—in simple terms to business owners. They also must learn basic aspects of data pipelines to collaborate effectively with data engineers. They likely already have the necessary programming expertise to collaborate effectively with ML engineers and DevOps engineers during the ML operations phase.

- Kevin Petrie in The Machine Learning Lifecycle and MLOps: Building and Operationalizing ML Models - Part IV

July 9, 2021 (Blog)

Finally, the most technically savvy self-service users are data scientists. They have an extremely high level of data literacy and are typically fluent in multiple coding languages which they use to build machine learning algorithms and statistical models. Much of their work involves surfacing insights from big data and previously undervalued data. As a result, they need direct access to data that might not be prioritized in existing data warehouse structures.

- Joe Hilleary in Supporting Self-Service Analytics with a Unified Platform

March 12, 2021 (Blog)

There are three types of data scientists: Research data scientist, applied data scientist, and citizen data scientist. Research data scientists focus on discovering and applying methods to generate new algorithms. Applied data scientists take well-established models to solve business problems and configure them using open source libraries or tools like SAS. An applied data scientist wears many hats to manage machine learning engineering, ETL, and think about delivering business value. An industrialized product can be amplified under the supervision of citizen data scientists. It's better to have citizen data scientists deal with the business because they are less likely to use data science jargon.

- Wayne Eckerson in Alex Vayner: Data Scientists - Who They Are, Where to Find Them and How to Keep Them

August 14, 2019 (Podcast)

Relevant Content

Jul 09, 2019 - Why aren't companies industrializing data science models? Alex Vayner answers in this podcast.

Jan 06, 2019 - It’s not technology that determines success in data analytics; it’s people, processes, and programs, which is the focus of this special report.

Mar 23, 2020 - This report describes why self-service analytics seems so easy in theory but is challenging in practice, and how to do it right. 

Apr 15, 2021 - The lifecycle of machine learning projects spans data and feature engineering, model development, and ML operations or MLOps.

Sep 01, 2016 - To succeed with self-service analytics, organizations need a reference architecture that maps business users, technology, and developers to an...

Related Terms

Datalere

Unleash The Power Of Your Data

Providing modern, comprehensive data solutions so you can turn data into your most powerful asset and stay ahead of the competition.

Learn how we can help your organization create actionable data strategies and highly tailored solutions.

© Datalere, LLC. All rights reserved

383 N Corona St
Denver, CO 80218