If there’s one thing to take away from this blog post, it’s that data science does not succeed in a bubble. Like any corporate initiative, launching a data science project requires forethought, planning, and successful execution to succeed. But what does that entail? What steps can I take to ensure that my organization’s data science project will be successful? This blog will dive into the most common characteristics of data science projects that result in a valuable investment—one that pays for itself.
Datalere often hears from prospective and current clients who either have a history of failed data science projects or projects that were successful but painfully executed. These projects may have been executed internally or via third-party data science consulting firms.
Common complaints from our clients include missed deadlines (in certain cases, work was completed in a time frame that was more than twice as long as originally estimated) and, most commonly, that the data scientists tasked with the assignment did not have the necessary skills to fully complete the project, which involved integration into the organization’s existing systems, combined with monitoring and fine-tuning.
So what makes a data science initiative successful? How can your organization avoid failure? The following are four characteristics of successful data science projects.
1. A Team with a Complete Set of Skills and Competencies
Some skills are well known to be in any data scientist’s repertoire—skills such as statistics, mathematics, data manipulation, feature engineering, scripting in R or Python, and using advanced algorithms such as neural nets. These skills represent the core data science knowledge base, inclusive of tools that have become a standard in the field today.
However, in order to fully implement a data science solution in your organization, you also need to plan for many factors that fall outside of a data scientist’s standard mathematical toolkit. A complete set of necessary skills and competencies includes:
- Data Architecture: Can my data scientists access all of the desired data, and will they be able to automate that access to build out data pipelines? We don’t want our data scientists manually importing/exporting CSV files or building custom data pipelines every time a data refresh is desired.
- Business Process and Analysis: If your data science model depends on total revenues, how does your organization define total revenues? If you use a custom definition, will this impact the usability of the output of your data science model? Additionally, your data science project should follow your organization’s standard development processes, potentially inclusive of management methodology (Scrum, Kanban, etc.), source control, release schedules, etc.
- Database Management: Which databases are powering your data science solution? Are they the best databases for your data and your purpose? Correctly modeled data stored in the correct database for your enterprise means that data within your organization is easily accessible and fast to query. Having a centralized database for your data warehouse is pivotal for organizations—you don’t want to create your data science solutions in a vacuum. Each data science solution should be accessible throughout the organization so you don’t have duplication of efforts, which is an especially commonplace problem in large organizations.
- Software and Data Integration: Does your data science solution require integration into an organizational application or into your organization’s existing data warehouse?
- Database Data Models: How are the input/output data for your data science models stored in the database? Creating a single table that houses your data science output may work temporarily, but any data-driven organization will eventually outgrow this naïve form of data storage and will require more efficiently modeled database that lends itself to the kind of speed that your business users require. Even managed columnar databases require some degree of data modeling to ensure high performance. Think of it this way—if your business users are subjected to a painfully slow end-user experience, they’re far less likely to use the output of your data science model.
- End-user Visualization: How will the non-technical business user access the results of the data science project? They can’t be expected to learn Python or R to interact with the output of the data science algorithm, and further, you want to build out data systems that allow business users to have an interactive, self-service experience so that your data scientists aren’t responsible for responding to multiple requests for different views of the algorithm’s output data on a daily basis.
2. Understanding the Possibilities
Framing the problem that you’re expected to solve is an important part of the process. How does your business customer request the solution? Oftentimes, business users struggle to articulate their problem. They may not know what’s possible, the appropriate turnaround for such as request, and whether the organization even possess the necessary underlying data. Further, it’s important to ask—how does the implemented solution deliver its value? When and where is the output used? Can the process be automated so that it’s repeatable?
Answers to these questions help you frame the solution for your customer so you can provide the desired value, set reasonable expectations, and ensure that the goals of the data science project are achievable.
3. Development Cycles
Gather, analyze, develop, integrate, test, visualize—following this process in short iterative cycles will allow for stakeholder input, cause problems to surface early, and allow for modifications to the end result as you proceed throughout the development stage of your project. This process, when done correctly, has been proven to be an effective way to manage successful data science projects. The opposite of an iterative management approach is to let your data scientists go off on their own for weeks or months at a time, then emerge with a solution they developed in a vacuum. This style of project management is sure to fail.
4. Team Integration
The business and technical teams need to work together. In the past, we’ve always known that software development worked best when all team members worked in short spurts as a tightly-knit group. This included subject matter experts, business analysts, process managers, software engineers, database administrators, architects, QA staff, end-users, and many, many more. All roles had to be filled even if one individual played multiple roles. This is also true of a data science project. It takes careful planning to identify who will fill each of the identified necessary roles when the project kicks off and doing so drastically improves your chances of success.
At Datalere, we have the consulting expertise and technical skills to get you there. With a deep bench of experience in database modeling, on-premises and cloud-based databases, building data pipelines, data science, application development, and end-user visualization, our team has expertise in every area necessary to execute any data science project successfully.
Executing a data science project that adds value to your organization doesn’t have to be a risky undertaking and successful delivery can be ensured through the right team and approach. Feel free to reach out for a free consultation; we’re happy to talk data science!