Don't call it Data Governance

Idea: What if we stop calling it Data Governance? 

Data Governance elicits feelings of boredom and numbness in my brain. Governance rhymes with Compliance. When is the last time you got excited about Compliance? Yeah, I didn’t think so. 

What is Data Governance? Let’s start with what it is not. It is not the tech side of things. We got our cloud infrastructure, raw data sources, transformation pipelines, specific data models, and dashboards. We got wrangled data sets for machine learning; we got regression and deep learning models. There’s a lot of SQL, Python, R code moving all the 0s and 1s around. That’s the tech side. 

The compliment to the tech side is the context. It’s the subject matter, the meaning, the business logic, the why, the how. 

Imagine we are trying to calculate the lifetime value (LTV) of a healthcare system patient. I know, I am crazy to apply a standard marketing metric to healthcare, indulge me. We might have the best data scientists east of the Mississippi; they can build models in their sleep. But our brilliant developers have no idea about the ins and out of healthcare patient revenue. Spoiler alert, it’s loaded with complexity.

They don’t know that some patients' LTV is based on their Medicare Advantage risk-adjusted capitated payments (fee-for-performance). For those patients, we get revenue based on membership, not on services provided. Then other patients just come in when they need a flu shot. The healthcare system gets paid every time they visit. (fee-for-service). And then there’re denials and write-offs to factor in, healthcare revenue cycle is a beast. 

To get to our patients’ LTV, we need to understand all these subtleties and carefully define the metric calculation for different patient tranches. We need to work together with the people that know the little details inside and out. We need to write it down; we don’t want anyone else starting from scratch (templates, business glossary). We need to check that our business definition matches our code (data validation, data integrity). We need someone on the business side to be our partner; they’ll help us validate, they’ll tell us what’s working and what’s useless, they’ll answer our questions, even the stupid ones (data stewards). We need a way to keep track of all the code, data models, reports, and dashboards that are related to this metric. (data lineage, data dictionary). 

We need to understand, organize, and keep track of the context that sits on top of our technology. This is data governance. But when I describe it above, it doesn’t sound dull or scary. It’s all the other stuff that around your code that makes what we are doing valuable to the business. It’s the fun stuff - it’s where the impact happens. 

So what if we stopped calling it Data Governance and started calling it Data Context instead. My eyes would glaze over less.

#datacontext by #datapavel


Digital twins, what starts with a wind turbine, ends with a digital Pavel?

I love sci-fi kind of stuff, so when my buddy mentioned digital twins on our first podcast episode, my ears perked up. 

What are digital twins? In essence, they are digital copies of real-world objects; they are computer simulations, lots of code, algorithms, and data all meshed together. 

Wikipedia offers a more elegant definition: ‘A digital twin is a digital replica of a living or non-living physical entity. By bridging the physical and the virtual world, data is transmitted seamlessly, allowing the virtual entity to exist simultaneously with the physical entity.’ 

Simulations have existed for a long time. Lots of us, myself included, have taken a simulation class. My class project was simulating the checkout lines at a Duane Reade in New York, ohh, the excitement!

So what’s different now, what’s with all the buzz?

Various technologies have matured and teamed up to make digital twins so robust of a simulation that they are pretty good digital copies of real objects.

The most significant factor is the so-called Internet of Things. We now have lots and lots of sensors and can collect and process data in real-time from all sorts of equipment, from the space shuttle engine to your smart fridge. (You don’t have a smart fridge? What are you living in 1990?) 

See, a digital twin is not just some code written by humans; it’s taking in real-time data from the physical object and adjusting the digital twin to match. 

Ok, lots of sensor data is coming in, but you still need a way to make sense of it. Here come our favorites: AI and machine learning algorithms can take in all that data and magically (mathematically) create a ‘living’ virtual model. 

IoT sensor data, machine learning, cloud computing all come together to make this happen.

Today, the applications are mostly for large industrial equipment. GE is using the framework to improve wind farm operations by building a full digital wind farm. NASA is using it to test next-generation space-craft, testing it before building it. 

That’s the jelly in this donut; you can build a whole production line out of digital twins and then experiment without actually doing any of the expensive physical testings.

Can this concept be applied to living things? To humans like you and me?

side note - am I human or am I data?

Can you imagine a digital copy of yourself in your EMR, updated continuously based on your real-time data: calories, steps, sleep, medications, real-time biometric data like heart rate and blood pressure, etc.…

If you have enough data to build a virtual copy, can we test a drug on a person without testing it on the actual person?

Can we simulate based on an individual’s genetic code and their gut microbiome? Is this the future of personalized medicine? I am getting excited.

I think we are still quite some time away from perfect digital copies of our bodies, but I can see it happening in the next 20 years. One thing for sure, we are going to need to store and process all that data. That means more opportunity for big tech and more opportunity for anyone that likes to work with data.

Data data data everywhere, with no signs of slowing down.