Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for What is Data Profiling?
Preeti Hemant
Preeti Hemant

Posted on

     

What is Data Profiling?

Imagine this scenario...

You have started a new job as a Data Engineer. It's an exciting time, new data, new tools, new people! While you are familiarizing yourself, your manager who is also very excited to have you look at their backlog, sends a request across.

"One of our stakeholders has been waiting on this for a long time - could you look intothis job that's resulting in a very large table in the data warehouse. The stakeholder would love an aggregated table with the data rolled up to a week."

What is one of the first things you do?

Data Profiling!

It's a systematic analysis of the data in a table to understand the structure and relationships - leading to conclusions on its usability.

The tools to extract this information are measures in descriptive statistics - min, max, mode, frequency, sum, count (You get the picture).
The next set of parameters to understand are data types, nulls, uniqueness which answer questions on missing data, duplicates and so on.

You could run this analysis on individual columns or do a cross-columns analysis.

At the end of this exercise, you should be able to make a few decisions
1) Is this data able to support the analysis/reporting use case?
2) Do the data quality issues originate from the source or from the ETL jobs?
3) Are there inconsistencies in the structure of the table?

So, this was a short introduction to the Data Profiling technique. For further reading, here are some links.

  1. http://www.kimballgroup.com/wp-content/uploads/2012/05/DT59SurprisingValue.pdf
  2. https://www.researchgate.net/publication/262221918_Data_Profiling_Revisited

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

  • Location
    Canada
  • Education
    Arizona State University
  • Work
    Data Leader
  • Joined

More fromPreeti Hemant

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp