Research & Design

Trusting a Distributed Data Pipeline

Duration: 40 minutes

Speaker

Richard Guy

Principal Applied Researcher

Specializing in data mining, modeling, and engineering, Richard works with a team of data mining researchers at Microsoft.

About The Session

Conclusions you reach with data are only valid if they correctly interpret your data set. In many organizations, the responsibility for collecting and aggregating data is distributed, so it can be hard to ensure that everyone who uses a data set understands the limitations of the signals in that pipeline.

As an example, many companies make important decisions about what events constitute an “active user,” and these decisions are reflected in the pipeline code. Changes to a pipeline may not be communicated to all downstream users, leading to misinformed conclusions even from correctly executed analyses.

In this talk, Richard will share three key questions to help ensure that you are interpreting your data correctly and drawing accurate conclusions.

Key Questions

For data consumers: what business decisions are implicit rather than explicit in the data that I am using?
For data producers: who is using your data, and are they aware of changes that you make?
For organizations: how does your organization prevent unintentional changes to the meaning of a data set?

Suggested Resources

Share This Session

Email Linkedin Twitter

Other Sessions

How to Optimize for SaaS Retention

Beyond the Primary KPI: Leveraging Bad Test Results

Features +125 more

Features +120 more

Ebooks

Webinars

Latest Blogs

Featured Blogs

Thought Leadership Interviews

Past Webinars

Trusting a Distributed Data Pipeline