我在一台jupyter笔记本上使用python和pyspark。我正在尝试从亚马逊网络服务的s3存储桶中读取几个拼图文件,并将它们转换为单个json文件。这就是我所拥有的:from pyspark.sql import DataFrame
for key in bucket.objects.all():
pr
但是,在交互式shell中使用以下代码片段创建RDD时会发生错误(还可以检查上述书第32页中的示例2-1 ): File "D:\Software\spark-3.2.1-bin-hadoop3.2\python\pyspark\rdd.py", line 1226, in sum
return\spark-3.2.1-bin-hadoop3.2\python
bin-without-hadoop,它位于以下目录中:当我转到该目录,然后执行bin并尝试运行pyspark时,我得到了以下错误:
/usr/local/bin/pyspark: line 24: ~/Desktop/ahajib/opt/spark-2.1.0-bin-without-hadoop/bi
wordCounts.items():在从终端运行之后:我不犯错误\lib\pyspark.zip\pyspark\worker.py", line 25, in <module>
ModuleNotFoundError: No module named 'resource\lib\pyspark.zip\pyspark