CUME_DIST: Calculating Cumulative Distribution

It's **Day 71**. Today we look at a niche but important statistical function: `CUME_DIST()`.

What is CUME_DIST?

It stands for **Cumulative Distribution**. It returns the probability that a value in the group is less than or equal to the current row's value.

In plain English: "What proportion of the group is at or below me?"

Formula

`(number of rows with value <= current row value) / (total number of rows)`

The Code Example

SELECT

student_name,

test_score,

CUME_DIST() OVER(ORDER BY test_score) as distribution

FROM exam_results;

Why it's different from PERCENT_RANK

`PERCENT_RANK` starts at 0. `CUME_DIST` always starts at `1/N`. Analysts prefer `CUME_DIST` when they need to know the literal "Count proportion," whereas `PERCENT_RANK` is more about the "Relative gap."

Use Case: Risk Management

In finance, `CUME_DIST` is used to identify the "Value at Risk" (VaR)—predicting the probability of losses based on historical data.

Your Task for Today

Choose a column and calculate its `CUME_DIST`. Notice that the highest value will always be exactly 1.0.

*Day 72: Nth_VALUE—Grabbing that specific 3rd item.*