我有一个类似下表的表格:
+--------------------+--------------------+-------------------+
| ID| point| timestamp|
+--------------------+--------------------+-------------------+
|679ac975acc4bdec9...|POINT (-73.267631...|2020-01-01 17:10:49|
|679ac975acc4bdec9...|POINT (-73.271446...|2020-01-01 02:12:31|
|679ac975acc4bdec9...|POINT (-73.265991...|2020-01-01 17:10:40|
|679ac975acc4bdec9...|POINT (-73.271446...|2020-01-01 02:54:15|
|679ac975acc4bdec9...|POINT (-73.265609...|2020-01-01 17:10:24|
+--------------------+--------------------+-------------------+
我想计算所有点之间的距离,但我做不到。
但是,我可以用下面的方法计算comlumn point
中的每个点到特定点的距离
distances = spark.sql(
"""
SELECT ID, timestamp, point,
ST_Distance(point, ST_PointFromText('-74.00672149658203, 40.73177719116211', ',')) as distance
FROM myTable
""").show(5)
+--------------------+-------------------+--------------------+------------------+
| ID| timestamp| point| distance|
+--------------------+-------------------+--------------------+------------------+
|679ac975acc4bdec9...|2020-01-01 17:10:49|POINT (-73.267631...|0.7485722629444987|
|679ac975acc4bdec9...|2020-01-01 02:12:31|POINT (-73.271446...|0.7452303978930688|
|679ac975acc4bdec9...|2020-01-01 17:10:40|POINT (-73.265991...|0.7503403834426271|
|679ac975acc4bdec9...|2020-01-01 02:54:15|POINT (-73.271446...|0.7452310193408604|
|679ac975acc4bdec9...|2020-01-01 17:10:24|POINT (-73.265609...|0.7511492495935203|
+--------------------+-------------------+--------------------+------------------+
如何计算一行中一个点到下一个点的距离?
发布于 2020-03-29 01:11:00
如果我对问题的理解是正确的,那么您希望收集point
列中的行之间的相邻差异。我相信你可以通过一个lag
函数(https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.lag)和一个Window
(http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=window#pyspark.sql.Window)来完成这个任务:
from pyspark.sql.functions import lag, col
from pyspark.sql.window import Window
window = Window.partitionBy().orderBy("ID")
df = df.withColumn('distance', col('point') - lag(col('point')).over(window))
https://stackoverflow.com/questions/60901316
复制相似问题