Ever wished you could perform complex calculations across your data while preserving every detail? Enter SQL Analytical Functions. These powerhouse features give you the best of both worlds: the ability to analyze groups of related rows without sacrificing the granularity of your data that traditional aggregate functions often require.
Table of Content
- Introduction
- Understanding Analytical Functions Syntax
- Common Analytical Functions
- Understanding Aggregate Functions as Analytical Functions
- Best Practices for Analytical Functions
- Conclusion
- FAQs
- Q1. What's the difference between analytical functions and aggregate functions?
- Q2. What's the purpose of the PARTITION BY clause?
- Q3. How does ORDER BY affect analytical function results?
- Q4. What's a window frame and why is it important?
- Q5. Can analytical functions be used in WHERE clauses?
- Q6. What are the most commonly used analytical functions?
- Q7. How do you handle NULL values in analytical functions?
- Learn more about SQL and other topics
Introduction
SQL Analytical Functions, also known as window functions, are powerful tools that perform calculations across a set of rows related to the current row. Unlike regular aggregate functions, analytical functions don’t collapse the result set, allowing you to maintain the individual row details while performing complex calculations.
Understanding Analytical Functions Syntax
Basic Analytical Functions Structure
FUNCION_NAME(<arguments>)
OVER ([PARTITION BY column1[, column2,...]]
[ORDER BY column1 [ASC|DESC][, column2,...]]
[ROWS|RANGE frame_clause])
Frame Clause Syntax
[ROWS | RANGE] BETWEEN frame_start AND frame_end
frame_start and frame_end options:
- UNBOUNDED PRECEDING
- n PRECEDING
- CURRENT ROW
- n FOLLOWING
- UNBOUNDED FOLLOWING
Common Analytical Functions
1. Ranking Functions
ROW_NUMBER()
Assigns a unique number to each row within a partition based on the order by clause.
Syntax
ROW_NUMBER()
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Example: Rank employees by salary within each department
-- Example: Rank employees by salary within each department
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as salary_rank
FROM employees;
-- Output:
-- EMPLOYEE_ID DEPARTMENT_ID SALARY SALARY_RANK
-- 100 10 5000 1
-- 101 10 4500 2
-- 102 20 6000 1
-- 103 20 4800 2
Explanation:
PARTITION BY department_id
breaks the data into groups by department_id.ROW_NUMBER()
gives a unique number within each partition.
RANK() and DENSE_RANK()
RANK() leaves gaps for ties, while DENSE_RANK() provides consecutive rankings.
Syntax
RANK()|DENSE_RANK()
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Example: Compare RANK vs DENSE_RANK for employee salaries
-- Example: Compare RANK vs DENSE_RANK for employee salaries
SELECT
employee_id,
salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as salary_dense_rank
FROM employees;
-- Output:
-- EMPLOYEE_ID SALARY SALARY_RANK SALARY_DENSE_RANK
-- 102 6000 1 1
-- 100 5000 2 2
-- 103 4800 3 3
-- 101 4500 4 4
-- 104 4500 4 4
-- 105 4200 6 5
-- 106 4000 7 6
Explanation:
RANK()
gives the rank of each row within a partition, with gaps for ties.- If two rows have the same
salary
, they’ll receive the same rank, and the next rank will be skipped. - In case of
DENSE_RANK()
you will have no gaps between ranks i.e if two rows have the samesalary
, they’ll receive the same rank but the next rank will not be skipped. - For employee 105, rank is 6 where as dense rank is 5
NTILE(): Breaking Data into Buckets
Divides rows into specified number of groups.
Syntax
NTILE(number_of_buckets)
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Example: Divide employees into 4 salary quartiles
-- Example: Divide employees into 4 salary quartiles
SELECT
employee_id,
salary,
NTILE(4) OVER (ORDER BY salary DESC) as salary_quartile
FROM employees;
-- Output:
-- EMPLOYEE_ID SALARY SALARY_QUARTILE
-- 102 6000 1
-- 100 5000 1
-- 103 4800 2
-- 101 4500 3
Explanation:
NTILE(4)
splits the data into four roughly equal parts based on thesalary
column.- Useful for percentile-based reporting.
2. Value Functions
LAG() and LEAD()
Access data from previous (LAG) or next (LEAD) rows.
Syntax
LAG|LEAD(expression[, offset[, default_value]])
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Example: Compare current salary with previous employee’s salary
-- Example: Compare current salary with previous employee's salary
SELECT
employee_id,
salary,
LAG(salary) OVER (ORDER BY employee_id) as prev_salary,
LEAD(salary) OVER (ORDER BY employee_id) as next_salary
FROM employees;
-- Output:
-- EMPLOYEE_ID SALARY PREV_SALARY NEXT_SALARY
-- 100 5000 NULL 4500
-- 101 4500 5000 6000
-- 102 6000 4500 4800
Explanation:
LAG()
retrieves the value from the previous row (previous employee’s salary).LEAD()
retrieves the value from the next row (next employee’s salary).
FIRST_VALUE() and LAST_VALUE()
Retrieve first or last value in a window frame.
Syntax
FIRST_VALUE|LAST_VALUE(expression)
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
[ROWS|RANGE frame_clause]
)
Example: Compare each salary with highest salary in department. You can use the function LAST_VALUE()
to get minimum salary of the department.
-- Example: Compare each salary with highest salary in department
SELECT
employee_id,
department_id,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as highest_dept_salary,
LAST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lowest_dept_salary
FROM employees;
-- Output:
-- EMPLOYEE_ID DEPARTMENT_ID SALARY HIGHEST_DEPT_SALARY LOWEST_DEPT_SALARY
-- 100 10 5000 5000 4500
-- 101 10 4500 5000 4500
-- 102 20 6000 6000 4800
-- 103 20 4800 6000 4800
Explanation:
FIRST_VALUE()
fetches the first salary in the window, whileLAST_VALUE()
fetches the last one.- The range in
LAST_VALUE()
ensures it looks at all rows in the partition.
3. Statistical Functions
PERCENT_RANK() and CUME_DIST()
Calculate relative rankings within a result set.
Syntax
PERCENT_RANK|CUME_DIST()
OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Example: Calculate percentage ranking of salaries
-- Example: Calculate percentage ranking of salaries
SELECT
employee_id,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 2) as percent_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 2) as cume_dist
FROM employees;
-- Output:
-- EMPLOYEE_ID SALARY PERCENT_RANK CUME_DIST
-- 101 4500 0.00 0.20
-- 103 4800 0.25 0.40
-- 100 5000 0.50 0.60
-- 102 6000 0.75 1.00
Advanced Usage: Combining Multiple Analytical Functions
-- Example: Comprehensive employee analysis
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as dept_salary_rank,
FIRST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) as highest_dept_salary,
ROUND(salary / SUM(salary) OVER (PARTITION BY department_id) * 100, 2) as salary_percentage
FROM employees;
-- Output:
-- EMPLOYEE_ID DEPARTMENT_ID SALARY DEPT_SALARY_RANK HIGHEST_DEPT_SALARY SALARY_PERCENTAGE
-- 100 10 5000 1 5000 52.63
-- 101 10 4500 2 5000 47.37
-- 102 20 6000 1 6000 55.56
-- 103 20 4800 2 6000 44.44
Understanding Aggregate Functions as Analytical Functions
When aggregate functions (SUM, COUNT, AVG, MIN, MAX) are used with the OVER clause, they become powerful analytical tools that can perform calculations across specified sets of rows while maintaining the detail-level data.
SUM() as Analytical Function
The SUM analytic function calculates running totals and partitioned sums while preserving row detail.
--Example 1: Calculate running total of sales by department
SELECT
department_id,
employee_id,
sale_amount,
SUM(sale_amount) OVER (
PARTITION BY department_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
SUM(sale_amount) OVER (PARTITION BY department_id) as dept_total
FROM sales;
-- Output:
-- DEPT_ID EMP_ID SALE_AMOUNT RUNNING_TOTAL DEPT_TOTAL
-- 10 101 1000 1000 3500
-- 10 102 1500 2500 3500
-- 10 103 1000 3500 3500
-- 20 201 2000 2000 5000
-- 20 202 3000 5000 5000
-- Example 2: Calculate percentage of department total
SELECT
department_id,
employee_id,
sale_amount,
ROUND(
sale_amount / SUM(sale_amount) OVER (PARTITION BY department_id) * 100,
2
) as pct_of_dept_total
FROM sales;
-- Output:
-- DEPT_ID EMP_ID SALE_AMOUNT PCT_OF_DEPT_TOTAL
-- 10 101 1000 28.57
-- 10 102 1500 42.86
-- 10 103 1000 28.57
-- 20 201 2000 40.00
-- 20 202 3000 60.00
Explanation: For example 1
- The
SUM()
function calculates the cumulative sum of theamount
column. OVER (ORDER BY sale_date)
tells SQL to compute the sum in a running order based onsale_date
.
COUNT() as Analytical Function
COUNT analytics can track running counts and provide insights about data distribution.
-- Example 1: Running count of orders by customer
SELECT
customer_id,
order_date,
order_amount,
COUNT(*) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as order_count,
COUNT(*) OVER (PARTITION BY customer_id) as total_customer_orders
FROM orders;
-- Output:
-- CUST_ID ORDER_DATE ORDER_AMOUNT ORDER_COUNT TOTAL_CUSTOMER_ORDERS
-- C1 2024-01-01 500 1 3
-- C1 2024-01-15 700 2 3
-- C1 2024-02-01 600 3 3
-- C2 2024-01-05 800 1 2
-- C2 2024-01-20 900 2 2
-- Example 2: Calculate density of orders by date range
SELECT
order_date,
COUNT(*) OVER (
ORDER BY order_date
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
) as orders_last_7_days
FROM orders;
-- Output:
-- ORDER_DATE ORDERS_LAST_7_DAYS
-- 2024-01-01 1
-- 2024-01-05 2
-- 2024-01-15 3
-- 2024-01-20 2
Explanation: For example 1
- The
COUNT()
function calculates the cumulative count of theorder
for each customer. OVER (ORDER BY order_date)
tells SQL to compute the count in a running order based onorder_date
.
AVG() as Analytical Function
AVG analytics are perfect for calculating moving averages and comparing values to averages.
-- Example 1: Calculate 3-day moving average of sales
SELECT
sale_date,
daily_sales,
ROUND(
AVG(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
),
2
) as moving_avg_3day
FROM daily_sales;
-- Output:
-- SALE_DATE DAILY_SALES MOVING_AVG_3DAY
-- 2024-01-01 1000 1000.00
-- 2024-01-02 1200 1100.00
-- 2024-01-03 1500 1233.33
-- 2024-01-04 1300 1333.33
-- Example 2: Compare to department average
SELECT
department_id,
employee_id,
salary,
ROUND(
AVG(salary) OVER (PARTITION BY department_id),
2
) as dept_avg_salary,
ROUND(
(salary - AVG(salary) OVER (PARTITION BY department_id)) /
AVG(salary) OVER (PARTITION BY department_id) * 100,
2
) as pct_diff_from_avg
FROM employees;
-- Output:
-- DEPT_ID EMP_ID SALARY DEPT_AVG_SALARY PCT_DIFF_FROM_AVG
-- 10 101 5000 4750.00 5.26
-- 10 102 4500 4750.00 -5.26
-- 20 201 6000 5400.00 11.11
-- 20 202 4800 5400.00 -11.11
Explanation: For example 1
- The
AVG()
function calculates the average. ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
specifies a window of three rows (the current row and two preceding ones) over which to calculate the average.
MIN() and MAX() as Analytical Functions
MIN and MAX analytics help identify boundaries and ranges within data sets.
-- Example 1: Track high and low points in each quarter
SELECT
quarter,
sale_date,
sale_amount,
MIN(sale_amount) OVER (PARTITION BY quarter) as quarter_low,
MAX(sale_amount) OVER (PARTITION BY quarter) as quarter_high,
sale_amount - MIN(sale_amount) OVER (PARTITION BY quarter) as amount_above_low
FROM quarterly_sales;
-- Output:
-- QUARTER SALE_DATE SALE_AMOUNT QUARTER_LOW QUARTER_HIGH AMOUNT_ABOVE_LOW
-- Q1 2024-01-15 1000 1000 1500 0
-- Q1 2024-02-01 1500 1000 1500 500
-- Q1 2024-03-15 1200 1000 1500 200
-- Q2 2024-04-01 1800 1300 1800 500
-- Q2 2024-05-15 1300 1300 1800 0
-- Example 2: Rolling high/low comparison
SELECT
trade_date,
stock_price,
MIN(stock_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) as rolling_low,
MAX(stock_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) as rolling_high,
ROUND(
(stock_price - MIN(stock_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
)) / MIN(stock_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) * 100,
2
) as pct_above_low
FROM stock_prices;
-- Output:
-- TRADE_DATE STOCK_PRICE ROLLING_LOW ROLLING_HIGH PCT_ABOVE_LOW
-- 2024-01-01 100.00 100.00 100.00 0.00
-- 2024-01-02 102.50 100.00 102.50 2.50
-- 2024-01-03 101.75 100.00 102.50 1.75
-- 2024-01-04 103.25 100.00 103.25 3.25
Explanation: For example 1
- The
MIN()
function calculates the minimumsale_amount
of the each quarter. - The
MAX()
function calculates the maximumsale_amount
of the each quarter.
LISTAGG() as Analytical Functions
LISTAGG is a powerful analytical function that aggregates data from multiple rows into a single concatenated string. When used with the OVER clause, it provides unique capabilities for creating comma-separated lists while maintaining row-level details.
Syntax
LISTAGG(measure_column [, 'delimiter']) WITHIN GROUP (ORDER BY sort_column)
OVER (PARTITION BY partition_column)
Basic Employee Skills Aggregation
-- Create a sample employee skills table
CREATE TABLE employee_skills (
employee_id NUMBER,
department_id NUMBER,
skill_name VARCHAR2(50)
);
-- Sample data
INSERT INTO employee_skills VALUES (101, 10, 'Java');
INSERT INTO employee_skills VALUES (101, 10, 'Python');
INSERT INTO employee_skills VALUES (101, 10, 'SQL');
INSERT INTO employee_skills VALUES (102, 10, 'Python');
INSERT INTO employee_skills VALUES (102, 10, 'R');
INSERT INTO employee_skills VALUES (103, 20, 'JavaScript');
INSERT INTO employee_skills VALUES (103, 20, 'HTML');
-- Basic LISTAGG query
SELECT
employee_id,
department_id,
LISTAGG(skill_name, ', ') WITHIN GROUP (ORDER BY skill_name)
OVER (PARTITION BY employee_id) as employee_skills
FROM employee_skills;
-- Output:
-- EMPLOYEE_ID DEPARTMENT_ID EMPLOYEE_SKILLS
-- 101 10 Java, Python, SQL
-- 101 10 Java, Python, SQL
-- 101 10 Java, Python, SQL
-- 102 10 Python, R
-- 102 10 Python, R
-- 103 20 HTML, JavaScript
-- 103 20 HTML, JavaScript
Explanation:
- The
LISTAGG()
function groups the employee skills for each employee and creates a comma-separated lists of skills they have.
Advanced Usage: Combining Multiple Analytical Functions
--Comprehensive sales analysis
SELECT
department_id,
sale_date,
sale_amount,
-- Running total
SUM(sale_amount) OVER (
PARTITION BY department_id
ORDER BY sale_date
) as running_total,
-- Moving average
ROUND(AVG(sale_amount) OVER (
PARTITION BY department_id
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) as moving_avg,
-- Comparison to department bounds
ROUND(
(sale_amount - MIN(sale_amount) OVER (PARTITION BY department_id)) /
(MAX(sale_amount) OVER (PARTITION BY department_id) -
MIN(sale_amount) OVER (PARTITION BY department_id)) * 100,
2
) as percentile_in_dept
FROM sales;
-- Output:
-- DEPT_ID SALE_DATE SALE_AMOUNT RUNNING_TOTAL MOVING_AVG PERCENTILE_IN_DEPT
-- 10 2024-01-01 1000 1000 1000.00 0.00
-- 10 2024-01-15 1500 2500 1250.00 100.00
-- 10 2024-02-01 1200 3700 1233.33 40.00
-- 20 2024-01-05 2000 2000 2000.00 0.00
-- 20 2024-01-20 3000 5000 2500.00 100.00
Explanation:
- The example shows multiple analytical functions usage in single SQL for complex calculations.
- The SUM() is used to get running total and AVG() is used to calculate moving average.
Best Practices for Analytical Functions
Performance Considerations
- Use appropriate indexes for PARTITION BY and ORDER BY columns
- Limit window size when possible using ROWS clause
- Consider materialized views for frequently accessed calculations
- Consider partitioning for large datasets
Window Frame Considerations
- Default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- Use ROWS for count-based windows
- Use RANGE for value-based windows
Formatting, Readability and Maintainability
- Use meaningful column aliases
- Break complex analytics into steps using CTEs
- Document window frame specifications
- Round decimal values appropriately
Common Pitfalls and NULL Handling
- Be careful with NULL values in window functions
- Pay attention to window frame boundaries
- Use COALESCE or NVL when necessary
- Consider data type conversions
- Consider excluding NULLs with filtering
Conclusion
Oracle SQL Analytical Functions provide powerful tools for complex data analysis. By understanding these functions and their proper usage, you can significantly improve your data analysis capabilities and write more efficient queries. Remember to consider performance implications when working with large datasets and always test your queries thoroughly.
FAQs
Q1. What’s the difference between analytical functions and aggregate functions?
- Aggregate functions group rows and return a single result per group
- Analytical functions perform calculations across rows while preserving individual row details
- Example: SUM as an aggregate collapses rows, while SUM as an analytical function keeps all rows and shows running totals
Q2. What’s the purpose of the PARTITION BY clause?
- PARTITION BY divides data into groups for processing
- Calculations are performed independently within each partition
- Think of it as a way to segment your data (like by department, region, or year) without losing row-level detail
Q3. How does ORDER BY affect analytical function results?
- ORDER BY determines the sequence of rows for processing
- Critical for functions like LAG, LEAD, FIRST_VALUE, LAST_VALUE
- Impacts calculations involving running totals and moving averages
Q4. What’s a window frame and why is it important?
- Window frame defines the range of rows used for each calculation
- Specified using ROWS or RANGE clause
- Examples: ROWS BETWEEN 3 PRECEDING AND CURRENT ROW, or RANGE UNBOUNDED PRECEDING
Q5. Can analytical functions be used in WHERE clauses?
- No, analytical functions cannot be used in WHERE clauses
- They can only appear in SELECT list or ORDER BY clause
- Use subqueries or CTEs if you need to filter based on analytical function results
Q6. What are the most commonly used analytical functions?
- ROW_NUMBER(): Assigns unique numbers to rows
- LAG/LEAD: Access previous/next row values
- RANK/DENSE_RANK: Assign rankings to rows
- FIRST_VALUE/LAST_VALUE: Retrieve first/last values in a window
- NTILE: Divide rows into specified number of groups
Q7. How do you handle NULL values in analytical functions?
- Most analytical functions ignore NULLs by default
- IGNORE NULLS or RESPECT NULLS clause can be used with certain functions
- NULLs can affect window frame calculations and rankings
Learn more about SQL and other topics
- What are the Most Common SQL Tricks Everyone Should Know?
- How To Use SQL The Best Way: Tips and Tricks
- Learn Analytical Functions by Oracle
- Data Preparation With SQL: An Ultimate Cheat Sheet
- Beginner’s SQL Interview Questions: Get Prepared Now!
- Oracle Interview Questions: Must Know Before Your Big Day
- How To Use SQL? An Ultimate Beginners Guide
- Unmasking The Best of SQL: Test Your Expertise! (2024)