December 17, 2019

How to Avoid Hitting the “Delete” Button on Institutional Knowledge

Your data science team members have irreplaceable knowledge about your company that disappears when they move on to another company.

 Kaycee Lai

Kaycee Lai

Founder

It’s well understood at this point that data scientists and data engineers have a wealth of expertise that’s difficult to come by and in-demand. Your data science team, however, has much more than that. They’ve got the ‘lay of the land’–irreplaceable knowledge about your company’s enterprise data landscape, which unfortunately disappears when they move on to another company. Consider that to answer a business question or build a solution to a business challenge, they’ll have to figure out:

These are of course just a few examples of learnings that data scientists and engineers gain that make them a treasure-trove of information that can accelerate future analytics projects, and even protect your organization from lawsuits or regulatory inquiries. Unfortunately these ‘treasure-troves’ don’t stick around long–a recent poll by Burtch Works suggests that on average data scientists may stay at your company for 2.6 years. That means that in less than 3 years your data science team could be totally different–consider it a proverbial ‘delete’ button on the institutional knowledge of your data landscape that gets pushed 3-4 times a decade. 

Sidestepping the “Delete” Button on Institutional Knowledge 

With the costs and risks of data-related delays becoming more serious as the business environment becomes more regulated and competition stiffens, this situation is unacceptable. Not surprisingly, it has become a major stumbling block in companies’ efforts to be data driven. Randy Bean of New Vantage Partners and Thomas Davenport of Deloitte reported in Harvard Business Review that the number of executives reporting adoption of Big Data as a major challenge rose from 65 percent last year to 75 percent in 2019. Of those respondents, 93 percent identify people and processes as the obstacle.

Fortunately, there are things that can be done to ensure that the institutional knowledge of your data landscape remains intact, regardless of who leaves your organization.

Clearly the solution to institutional knowledge-loss involves capturing information such as: the business problem that prompted the question, the question itself and the tables that held the data. 

If the data required to solve a particular problem happened to be in multiple systems such as Oracle DB, Snowflake or Hadoop, the SQL statements that collected that data might be a nightmare to reconstruct, so those must be captured too. 

You could have people input these questions into some kind of documentation scheme. But ask yourself, how likely are they to do that? As they say in the UK, not bloody likely. Additionally, the high level business question will in many cases have been asked by a business expert, and not the analyst who did the data munging, making it even tougher to collect all the information in a cohesive format. 

This is a job for Machine Learning

Clearly, human beings can’t be trusted with this kind of tedious documentation (if you require your data scientists to do this, they’ll probably just get different jobs). This process has to be automated from beginning to end. Data catalogs offer a start by organizing the data, but they don’t give you the context under which it might be useful to your organization–that information resides only in the brains of your data science team,.and here’s what companies need to capture and retain it: 

And to complete the loop, all of this information has to be captured in a repository that makes it searchable by the original question. With this in place, invaluable information on the location and state of data sets can be retained forever. An analyst faced with a business question merely poses that question to the system (much like a Google Search). If it happens that this question was posed by one of their predecessors, they’re guided to the full results of that search–all of their predecessor’s knowledge is at their fingertips, so that they can build on top of what was done before rather than rebuilding it from scratch. If that sounds like science fiction, read on.

Related Blog Posts

March 13, 2025

The Future of Enterprise AI: How Promethium’s Instant Data Fabric is Unlocking Trusted, Scalable Insights

Enterprise AI is evolving at breakneck speed. While organizations are eager to harness the power of Generative AI, they need a trusted, secure, and fast way to access data.

Continue Reading »
February 20, 2025

The Data Fabric Show Podcast Gains Significant Momentum – Hosts Stellar Guests from Acceldata, BigID, Databricks, National Grid Electrical Transmission and Night Markets

The Data Fabric Show, a podcast designed to help viewers create a modern data experience, is growing in popularity since its launch.

Continue Reading »
September 26, 2024

What Makes a Data Fabric: Understanding the Differences Between Microsoft Fabric and Promethium

Data fabric is essential for organizations seeking a more agile, comprehensive, and efficient way to manage their data.

Continue Reading »