Chapters

SQL Language

Posted by: Jaspreet

Last Updated on: 09 Oct, 2021


Interview Questions


Chapters




    5 tips you can use to get that data ready...
    1) Find "holes" in your data:
        -- missing data in a particular column
        WHERE column_a IS NULL

        -- patterns associated with holes
        SELECT
        column_b,
        COUNT(*) AS records
        FROM tablename
        WHERE column_a IS NULL
        GROUP BY column_b
        ORDER BY COUNT(*) DESC;


    2) Flag certain records (simple binary column):
        CASE
        WHEN column_a is NULL
        OR column_b = 'some impossible value'
        THEN 1
        ELSE 0
        END AS is_dirty


    3) Clean up messy values (to standardize them):
        CASE
        WHEN location IN('CA','ca','San Diego, CA') THEN 'CA'
        ELSE location
        END AS clean_location


    4) Bucket overly granular values into segments:
        CASE
        WHEN age < 18 THEN '1. < 18'
        WHEN age BETWEEN 18 AND 35 THEN '2. 18-35'
        WHEN age > 35 THEN '3. > 35'
        END AS age_group


    5) Analyze values across multiple records (powerful):
        -- note the use of 3,2,1 to reorder the values
        SELECT
        subscriber_id,
        MAX(
        CASE
        WHEN status = 'paying_subscriber' THEN '3. Active'
        WHEN status = 'still_in_trial' THEN '2. Trial'
        WHEN status = 'canceled' THEN '1. Canceled'
        END) AS current_status
        FROM subscription_table
        GROUP BY subscriber_id


    What else do you do to clean your data?
                    

Database Used:

1) Basics

  1. Top 5 rows
  2. WHERE , AND, OR, NOT, IN
  3. ORDER BY, ASC, DESC
  4. IS NULL
  5. LIMIT
  6. Last 3 records
  7. MIN, MAX, COUNT, AVG, SUM
  8. LIKE, WILDCARDS
  9. IN BETWEEN
  10. UNION
  11. LEFT, RIGHT, MID, CONCAT
  12. GROUP BY
  13. FIRST_VALUE, LAST_VALUE
  14. CASE WHEN
  15. UNION
  16. Count the number of non-null entries in a column
  17. When to use group by vs. distinct
  18. 1st and last record from the table
  19. Count of emp working in same dept
  20. String formatting, substring
  21. Subqueries
  22. HAVING vs WHERE
  23. MAX and MIN
  24. Difference between HAVING and WHERE
  25. When are COUNT(*) and COUNT DISTINCT equal
  26. Can left and inner join return the same results?
  27. What is the difference between RANK and DENSE RANK
  28. Can UNION and UNION ALL return the same results?
  29. What are the best scenarios to use a self join?
  30. How can you use SQL to dedupe rows?
  31. Why would you use GROUP BY ROLLUP?
  32. What happens if I GROUP BY a column that is not in the SELECT statement? Why does this happen?
  33. For dealing with NULL values, why would I choose to use IFNULL vs. CASE WHEN?
  34. Do temp tables make your code cleaner and faster, one of the two, or none? Why?
  35. When is a subquery a bad idea? A good idea?
  36. Why use the WITH statement? What does it help you do?
  37. Can you use COUNT DISTINCT on two columns? How?
  38. When does a full outer join actually make sense to use?
  39. Can UNION and UNION ALL return the same results?
  40. When is it a bad idea to use a subquery?
  41. What is the best approach to doing a running total?
  42. Inner joining tables A and B returns 50 rows. Left join B to A returns 125 rows. What's going on here?
  43. User Defined Functions
  44. COALESCE and CASE WHEN

2) Intermediate

  1. CTEs and WITH
  2. Using CTEs and temporary tables to improve readability
  3. 2nd highest salarry
  4. Nth Highest Salarry
  5. Using NTH Value
  6. Using ROW_NUMBER()
  7. Find all unique combination of two rows
  8. Using CTEs
  9. Name ends with 'a' & has 5 alphabets
  10. Calculate even & odd records from a table
  11. Window functions for moving average analysis
  12. Use a window function to calculate a running total
    Running total is, cumulative sum of all previous row State Level Running Total
  13. Window functions like rank and row

  14. LAG and LEAD

  15. In what scenarios is the LAG function useful?
  16. Write a SQL Query to fetch all the duplicate username records in a table
  17. From the login_details table, fetch the users who logged in consecutively 3 or more times
  18. SQL Query to fetch days when the temperature was very cold for consecutively 3 or more days
  19. Date-time manipulation

    A good video on date-time manipulation in SQL

  20. How to speed up a query
  21. When will ROW_NUMBER and RANK give different results? Give an example
  22. Is it possible for LEFT JOIN and FULL OUTER JOIN to produce the same results? Why or why not?
  23. LEFT SEMI and ANTI joins (not just LEFT and INNER)
  24. Cumulative aggregates using window functions
  25. Recursive CTEs for hierarchical problems
  26. Using table indexes and partitions to speedup queries
  27. Datetime manipulations for time series style data
  28. Identifying and isolating duplicates in datasets
  29. UDFs, views and loops to further automate processes
  30. Query optimization



References Used:

  • https://www.linkedin.com/posts/eric-weber-060397b7_datascience-data-sql-activity-6787722309848350721-PlIH
  • https://www.youtube.com/watch?v=UlXtfq-kuF4
  • https://www.linkedin.com/posts/srishti-srivastava-aaa04113a_datascience-sql-data-activity-6782611503947268097-ftdA
  • https://www.linkedin.com/posts/eric-weber-060397b7_data-datascience-analytics-activity-6794261762872020992-0Eha
  • https://www.linkedin.com/posts/eric-weber-060397b7_data-datascience-sql-activity-6780300222632407040-kwSo
  • https://www.linkedin.com/posts/eric-weber-060397b7_sql-data-datascience-activity-6808737477549969408-TJSi
  • https://www.linkedin.com/posts/eric-weber-060397b7_sql-datascience-machinelearning-activity-6808702703619903488-T2yA
  • https://www.linkedin.com/posts/datawithdanny_data-analytics-datascience-activity-6805675440045469696-NSqo
  • https://www.linkedin.com/posts/eric-weber-060397b7_data-datascience-analytics-activity-6804394683897298944-oLY4
  • https://www.linkedin.com/posts/eric-weber-060397b7_data-datascience-sql-activity-6715986430864224256-xDv3
  • https://www.linkedin.com/posts/eric-weber-060397b7_sql-datascience-data-activity-6707148760071372801-zAKk
  • https://www.linkedin.com/posts/akashraj001_my-checklist-before-going-for-an-sql-round-activity-6794270380061188096-ATmc
  • https://www.linkedin.com/posts/eric-weber-060397b7_sql-datascience-statistics-activity-6808368707434291200-Uf70
  • https://www.linkedin.com/posts/johnpauler_mavenquicktips-sql-data-activity-6807275059107942400-GRt_