我想要两行中的最高数和最低数,但是我要得到整个输出,我应该使用密集的秩或秩窗口函数吗?
like sopopular_eco_move=spark.sql("select a.eco,b.eco_name,count(b.eco_name) as number_of_occurance
from chess_game as a, chess_eco_codes as b where a.eco=b.eco group by a.eco,b.eco_name order
by
number_of_occurance desc")
popular_eco_move.show(10)
+---+--------------------+-------------------+
|eco| eco_name|number_of_occurance|
+---+--------------------+-------------------+
|C42| Petrov Defense| 64|
|E15| Queen's Indian| 56|
|C88| Ruy Lopez| 46|
|D37|Queen's Gambit De...| 44|
|B90| Sicilian, Najdorf| 38|
|C67| Ruy Lopez| 37|
|B12| Caro-Kann Defense| 37|
|C11| French| 35|
|C45| Scotch Game| 34|
|D27|Queen's Gambit Ac...| 32|
+---+--------------------+-------------------+
only showing top 10 rows结果属性: eco、eco_name、number_of_occurences最终结果将只有两行
发布于 2022-09-25 09:12:06
Hello尝试一个with子句来存储一个查询,并像这样重用它:
with my_select as
(select a.eco,b.eco_name,count(b.eco_name) as occurance
from `game`.`chess_game` as a, `game`.`chess_eco_codes` as b where a.eco=b.eco
group by a.eco,b.eco_name)
select * from my_select
where occurance = (select max(occurance) from my_select)
or occurance = (select min(occurance) from my_select)发布于 2022-09-25 12:53:44
如果您使用的是PySpark,那么您应该学习如何用Pythonic的方式编写它,而不仅仅是SQL。
from pyspark.sql import functions as F
from pyspark.sql.window import Window as W
(df
.withColumn('rank_asc', F.dense_rank().over(W.orderBy(F.asc('number_of_occurance'))))
.withColumn('rank_desc', F.dense_rank().over(W.orderBy(F.desc('number_of_occurance'))))
.where((F.col('rank_asc') == 1) | (F.col('rank_desc') == 1))
# .drop('rank_asc', 'rank_desc') # to drop these two temp columns
.show()
)
+---+--------------------+-------------------+--------+---------+
|eco| eco_name|number_of_occurance|rank_asc|rank_desc|
+---+--------------------+-------------------+--------+---------+
|C42| Petrov Defense| 64| 9| 1|
|D27|Queen's Gambit Ac...| 32| 1| 9|
+---+--------------------+-------------------+--------+---------+https://stackoverflow.com/questions/73842931
复制相似问题