CUME_DIST: Calculating Cumulative Distribution
It's **Day 71**. Today we look at a niche but important statistical function: `CUME_DIST()`.
What is CUME_DIST?
It stands for **Cumulative Distribution**. It returns the probability that a value in the group is less than or equal to the current row's value.
In plain English: "What proportion of the group is at or below me?"
Formula
`(number of rows with value <= current row value) / (total number of rows)`
The Code Example
SELECT
student_name,
test_score,
CUME_DIST() OVER(ORDER BY test_score) as distribution
FROM exam_results;
Why it's different from PERCENT_RANK
`PERCENT_RANK` starts at 0. `CUME_DIST` always starts at `1/N`. Analysts prefer `CUME_DIST` when they need to know the literal "Count proportion," whereas `PERCENT_RANK` is more about the "Relative gap."
Use Case: Risk Management
In finance, `CUME_DIST` is used to identify the "Value at Risk" (VaR)—predicting the probability of losses based on historical data.
Your Task for Today
Choose a column and calculate its `CUME_DIST`. Notice that the highest value will always be exactly 1.0.
*Day 72: Nth_VALUE—Grabbing that specific 3rd item.*