Get the highest answer rate question from a tablesurvey_log
with these columns:uid,action,question_id,answer_id,q_num,timestamp.
uid means user id; action has these kind of values: "show", "answer", "skip"; answer_id is not null when action column is "answer", while is null for "show" and "skip"; q_num is the numeral order of the question in current session.
Write a sql query to identify the question which has the highest answer rate.
Example:
Input:
+------+-----------+--------------+------------+-----------+------------+
| uid | action | question_id | answer_id | q_num | timestamp |
+------+-----------+--------------+------------+-----------+------------+
| 5 | show | 285 | null | 1 | 123 |
| 5 | answer | 285 | 124124 | 1 | 124 |
| 5 | show | 369 | null | 2 | 125 |
| 5 | skip | 369 | null | 2 | 126 |
+------+-----------+--------------+------------+-----------+------------+
Output:
+-------------+
| survey_log |
+-------------+
| 285 |
+-------------+
Explanation:
question 285 has answer rate 1/1, while question 369 has 0/1 answer rate, so output 285.
Note:The highest answer rate meaning is: answer number's ratio in show number in the same question.
Solution
Approach I: Using sub-query andSUM()
function [Accepted]
Intuition
Calculate the answered times / show times for each question.
Algorithm
First, we can useSUM()
to get the total number of answered times as well as the show times for each question using a sub-query as below.
SELECT
question_id,
SUM(CASE
WHEN action = 'answer' THEN 1
ELSE 0
END) AS num_answer,
SUM(CASE
WHEN action = 'show' THEN 1
ELSE 0
END) AS num_show
FROM
survey_log
GROUP BY question_id
;
| question_id | num_answer | num_show |
|-------------|------------|----------|
| 285 | 1 | 1 |
| 369 | 0 | 1 |
Then we can calculate the answer rate by its definition.
MySQL
SELECT question_id as survey_log
FROM
(
SELECT question_id,
SUM(case when action="answer" THEN 1 ELSE 0 END) as num_answer,
SUM(case when action="show" THEN 1 ELSE 0 END) as num_show,
FROM survey_log
GROUP BY question_id
) as tbl
ORDER BY (num_answer / num_show) DESC
LIMIT 1
Approach II: Using sub-query andCOUNT(IF...)
function [Accepted]
Algorithm
This solution is very straight forward: use theCOUNT()
function to sum the answer and show time combining with theIF()
function.
MySQL
SELECT
question_id AS 'survey_log'
FROM
survey_log
GROUP BY question_id
ORDER BY COUNT(answer_id) / COUNT(IF(action = 'show', 1, 0)) DESC
LIMIT 1;