Data distribution in a column

Data distribution measures the occurrence of a certain value in a column. Since producing a distribution of records from potentially millions of rows of data is no trivial task, distribution is only available for indexed columns headers. Indexing a column is essentially flagging it so the underlying data is stored for on-the-fly calculations.

Fig: Data distribution for a specific column in a dataset

Data distribution allows us to detect any changes in the underlying dataset that can be a proxy for quality issues. To view data distribution

Locate the dataset from the list in the Datasets page. Next jump over to the Dashboard tab
Scroll to the bottom of the page. Use the page filter to navigate to the correct page. You can ignore this filter if there are no other options in the filter
Choose ‘indexed columns’ from the columns dropdown which will only show said column type
Click on the graph icon next to the column name. This will open a pop-up for data distribution
You can use the graph to see how distribution is trending. You can also click on different datasets in the graph to load their value distribution

The data distribution graph only shows 10 most occurring values in the column.

Topics in this section: