Last day on LinkedIn, I saw a post in my feed about a woman who was making her transistion into the BI field.
In this particular post she posted her solution to a LeetCode problem, the original problem is this one here: https://leetcode.com/problems/second-highest-salary/description/
As you saw, the problem consisted of retrieving the second highest salary within in a Employee table. To replicate the problem you may run the script here to replicate a Employee Salary column (it will create in default schema).
create table employee (
id int identity(1,1),
salary float
)
declare @employee_nr int = 300
declare @i int = 1
while @i < @employee_nr begin
insert into employee (salary) values (round((rand()*10000),0))
set @i = @i + 1
end
delete employee where LEN(salary) < 4
select * from employee
After running this you should have something around 270 records in the table (the anomalys with salaries like 63 have been deleted and only kept the salaries with a length longer then four.
Well after a quick select and order by we see that the record we would like to return is 9921, as that is the second highest earning salary.
The approach the original post gave was using a subquery of smaller then max in the where clausule, which is interesting and yields the desired result.
select max(salary) from employee where salary < (select max(salary) from employee)
It does indeed yield the second highest salary, but is this the best approach? or let’s say it different, which other approaches could we use so we can find the ninth employee or even the next to lowest earning employee?
One of this approaches was mentioned a few times in the comment section (it was quite a popular LinkedIn post). It was using a combination of a common table expression together with a window function and row_number() to create a sequence in which later could be selected which position of the salary would be wanted to be visible. Let’s see below how this looks.
with cte as
(
select row_number() over (order by salary desc) row_num
, salary
from employee
)
select salary
from cte
where row_num = 2
Great so this yields the same result as before! Except now we can change the number in the where row_num = 2 to another number to be more flexibel and dynamic. BUT.. as there always is a BUT..
As you can see, the CTE/Window function approach is about twice as expensive to execute, the reason here is within the order by in the window function. On this scale it does not matter, but it is something to consider in queries with large tables.
The last approach I will show here is the OFFSET-FETCH, which I did not see return in that post. It is quite simple. you order a select statement by the salary and offset 1 row and fetch 1 row only. It is in my opinion the fastest query to write. It could offer some flexibility by implementing a variable with integer value into the offset to offer a bit more flexibility.
select salary from employee
order by salary desc
offset 1 rows
fetch next 1 rows only
But of course we want to see the execution path relative to the other queries so here we go.
Relative to the CTE/Window function approach it is slightly cheaper to run the offset method, but compared to the subquery it still makes it expensive, however the ability to implement a int variable with a number to offset by other amount of records it would be easily implemented as below.
declare @nr int = 1
select salary from employee
order by salary desc
offset @nr rows
fetch next 1 rows only
As you can see there are many approaches that would lead to the same result, some more efficient then others, and in the early stages of designing your database it is important to make the correct decisions as it could influence performance later on. And that could be a PITA.
And as always the consideration is to be made, time consumed coding, performance tweaking and performance of the database.
Thanks for reading and all feedback is welcome!