我有一张非常简单的表格:
CREATE TABLE IF NOT EXISTS LuxLog (
Sensor TINYINT,
Lux INT,
PRIMARY KEY(Sensor)
)
它包含了数千个来自不同传感器的日志。
我希望有Q1和Q3的所有传感器。
我可以对每个数据进行一次查询,但最好对所有传感器进行一次查询(从一次查询中获取Q1和Q3)
我认为这将是一个相当简单的操作,因为四分位数被广泛使用,并且是频率计算中的主要统计变量之一。 事实是,我发现了大量过于复杂的解决方案,而我却希望找到一些简洁简单的解决方案。
有人能给我点提示吗?
编辑:这是我在网上找到的一段代码,但它不适合我:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT( -- 1) make a sorted list of values
Lux
ORDER BY Lux
SEPARATOR ','
)
, ',' -- 2) cut at the comma
, 75/100 * COUNT(*) -- at the position beyond the 90% portion
)
, ',' -- 3) cut at the comma
, -1 -- right after the desired list entry
) AS `75th Percentile`
FROM LuxLog
WHERE Sensor=12
AND Lux<>0
我得到1作为返回值,而它应该是一个可以被10除的数字(10,20,30……1000)
请参阅SqlFiddle:http://SqlFiddle.com/#!9/accca6/2/6注意:对于我生成的100行的SqlFiddle,1到100之间的每个整数都有一行,但它是随机顺序(在excel中完成)。
下面是代码:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
编辑:
SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
基本的推理如下:对于四分位数1,我们希望从顶部得到25%,因此我们希望知道有多少行,即:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
现在我们知道了行数,我们想知道25%是多少,就是这一行:
SET @quartile := (ROUND(@number_of_rows*0.25));
然后,为了找到一个四分位,我们需要按Lux对LuxLog表进行排序,然后得到行号“@quartile”,为了做到这一点,我们将偏移量设置为@quartile,表示我们希望从行号@quartile开始选择,并且我们使用limit 1表示我们希望只检索一行。 那就是:
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
对于另一个四分位数,我们做(几乎)同样的事情,但我们不是从顶部(从更高的值到更低的值)开始,而是从底部开始(它解释了ASC)。
但是现在我们只有字符串存储在变量@sql_q1和@sql_q3中,所以将它们连接起来,我们将查询的结果联合起来,我们准备查询并执行它。
使用NTILE非常简单,但它是一个Postgres函数。 你基本上只需要做这样的事情:
SELECT value_you_are_NTILING,
NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
FROM
(SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);
下面是我在SQLFiddle上为您制作的一个简单示例:http://sqlfiddle.com/#!15/7f05a/1
在MySQL中,您可以使用rank.。。 下面是它的SQLFiddle:http://www.SQLFiddle.com/#!2/D5587/1(这来自下面链接的问题)
MySQL RANK()的这种用法来自于这里回答的Stackoverflow:RANK函数在MySQL中
寻找Salman A。
这样的事情应该可以做到:
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
下面是完整的示例:
use example;
drop table if exists luxlog;
CREATE TABLE LuxLog (
Sensor TINYINT,
Lux INT,
position int,
PRIMARY KEY(Position)
);
insert into luxlog values (0, 1, 10);
insert into luxlog values (0, 2, 20);
insert into luxlog values (0, 3, 30);
insert into luxlog values (0, 4, 40);
insert into luxlog values (0, 5, 50);
insert into luxlog values (0, 6, 60);
insert into luxlog values (0, 7, 70);
insert into luxlog values (0, 8, 80);
select count(*)*.25 from luxlog;
select count(*)*.50 from luxlog;
select
ll.*,
a.position,
b.position,
if(
a.position is not null, 1,
if (b.position is not null, 2, 0)
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
;
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;