Whilst preparing for Data Science interviews, I brushed up on SQL. Here is a question and my thought process in answering.

The problem comes from InterviewQs but I have no affiliation to them. In signing up to their free plan, you are emailed Data Science interview questions 3 times a week. I have found this website useful in preparing for the technical side of the application process.

We begin by thinking about what the resulting table should look like. We want the table to show the channel (online), month, total revenue for the month, total revenue for the previous month and revenue growth (as calculated by the formula given.) The code below shows the initial selection.

 SELECT channel,
month,
m_revenue,
prev_m_revenue,
(m_revenue-prev_m_revenue)/(prev_m_revenue) AS growth

The next step is to define m_revenue and prev_m_revenue .

To establish m_revenue we need to group by month and channel and then select only the online channel as that’s what we’re after. We can therefore consider the following intermediate step to address this part.

SELECT channel,
month,
sum(revenue) AS m_revenue
FROM orders
GROUP BY month, channel
HAVING channel = 'Online'

To get the previous month’s revenue, we can use theLAG function which allows access to a row at a specified physical offset which comes before the current row. In particular, it allows us to access the row before, which will have the previous month’s revenue.

The syntax of the LAG function is given below:

LAG (scalar_expression [,offset] [,default])OVER ( [ partition_by_clause ] order_by_clause )

In our case, we would have

LAG (SUM(revenue),1,0)
OVER (ORDER BY month)

Combining these two steps, our subquery is as follows:

SELECT channel,
month,
sum(revenue) AS m_revenue,
LAG(SUM(revenue),1,0) OVER (ORDER BY month) AS prev_m_revenue
FROM orders
GROUP BY month, channel
HAVING channel = 'Online'

Using this subquery with our initial selection and formula for growth, the final answer is obtained with the query below.

SELECT channel,
month,
m_revenue,
prev_m_revenue,
(m_revenue-prev_m_revenue)/(prev_m_revenue) AS growth
FROM(
SELECT channel,
month,
sum(revenue) AS m_revenue,
LAG(SUM(revenue),1,0) OVER (ORDER BY month) AS prev_m_revenue
FROM orders
GROUP BY month, channel
HAVING channel = 'Online') AS rev_table

Please let me know if you have an alternative solution or can think of any ways to optimise!

Mathematics graduate and Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store