By Brenda Cass, Data Science Strategist

Technologies like machine learning and artificial intelligence (AI) are hot in the healthcare industry these days, as evidenced by efforts from tech giants such as IBM and Google to develop tools for the space. However, the way these technologies are talked about can often be misleading in terms of their capabilities and strengths. With all the hype these terms generate, it can be easy to miss the purpose and application of machine learning, data science, and AI.

When Validic refers to these terms, we are referring to parts of an additive, multi-step process for generating value from data. The process begins with data curation, is enhanced by analytics, and continues to build value through machine learning. Each of these three steps in the data science process build upon the opportunities created by the last, and are technical responses to the natural growth of IT-accelerated businesses. Healthcare as an industry is not exempt from these trends – and now is the time to think about how the data science process applies to a healthcare organization.

Data Curation

Data curation is the foundational step in an organization’s data journey and, frequently, is the most difficult. Whether it is due to a legacy, paper-based process or disparate files on individual workstations, an organization’s first opportunity to gain value from their data is to aggregate it, standardize it, and make it generally accessible. HIPAA, with its mandate for electronic storage and standardized data flow, is an excellent example of data curation being applied at an industry level.

Simply digitizing a process, while certainly an important step, is not necessarily enough to fully realize the benefits that well-applied data curation can have. If conceptually related data do not have a clear way to be systematically tied together during analysis, if the same metric is referred to by multiple names, or if data is not accessible on demand, it is likely that the organization has more opportunity to derive value from deeper data curation.

A great example is Validic Inform, which is an excellent option for organizations interested in curating patient-generated health data (PGHD) sourced from many different connected devices. It takes care of standardizing the access and normalizing the shape of PGHD, all for an increasingly diverse set of connected health devices.

Leveraging Data Analytics

Once data has been curated, organizations will naturally want to begin analyzing it for the sort of macro trends that help humans with their decision-making. This is the step commonly referred to as analytics. Often, the seeds of analytics  take the form of reports manually constructed in spreadsheets. The real value, however, comes from automated reports containing data that is relevant to the sort of decision-making being performed by the report consumer. It is critical that tactical level data aggregates can draw a clear line to strategic aggregates or the elements of the organization risk making decisions that are at odds with each other. For example, if you find that success is measured differently across functions in your organization and it is not clear how those performance indicators tell the story of the organization’s general wellbeing, then analytics are likely not being leveraged to their full potential. If reports become stale without manual intervention or aren’t available on demand, then it is likely that your organization can see more value from analytics than they are currently getting. In response to these challenges, Validic Impact provides analytics features that build on top of the Validic Inform data connectivity platform. It integrates seamlessly with clinical workflows and enhances the data flow with configurable goals and notifications that help guide user behavior.

Applying Machine Learning

Though analytics does provide immense value, it also answers questions retroactively. The value generated is from finding patterns after, or at best, while they are happening. This inevitably raises the question: what if the organization knew what would happen ahead of time?

This is where machine learning enters into an organization’s data science journey. Machine learning enables predictive analysis that is able to determine when the next significant outcome is going to happen. These predictive models can also be tuned based on their application. For example, a model that aids in user engagement would be tuned to ignore more false positives to avoid being spammy, while a model that predicts a significant health outcome would be tuned to err on the side of caution and try to avoid more false negatives. Look for data that is labeled with significant outcomes to find opportunities where a predictive model can be applied. Validic uses a predictive model to determine the future activity level of its users, which creates an opportunity for preventative intervention that can keep users engaged with their connected devices.

Beyond prediction, machine learning is also able to automate classification tasks, meaning, it can be applied to figure out if a collection of data fits into a specific category. This sort of model is most notably being applied in the healthcare industry in the form of medical imaging diagnostics with a high level of success. Classifiers can also be used to identify anomalies, such as fraudulent activity, or for segmenting users into groups that share similar behaviors – an important step in building a recommendation system. Organizations with error-prone or time-consuming manual data labeling processes are likely to benefit from automating them with a classification model. In one of its own clustering experiments, Validic discovered the surprising significance of user-reported sleep data when it comes to identifying groups of users with similar behavior. Using insights like this one customers are able to recommend actions relevant to specific users in order to help them have better outcomes.

What is even more exciting about machine learning models is that they build on top of each other. Like conventional learning, the insights gained from simpler models can be fed as features into more complicated ones to predict or classify more sophisticated things. This quirk of machine learning in particular makes partnerships between different organizations’ data science teams with very fruitful.

Each of these steps are rich disciplines in their own right, deserving of their own in depth exploration, but it is important to understand the value each can provide, especially as it relates to healthcare today. Validic is actively applying data science principles to the creation of its products. To learn more and get involved in our data science initiatives that are making personal health data actionable and improving the quality of human life, contact us at

View More >