我刚开始使用云服务,我正在尝试在databricks和azure之间建立联系。我在databricks中有生成数据帧的笔记本,我想在synapse中填充一个专用的SQL池。
在查看了microsoft 文档建议的操作和步骤之后,我遇到了这个错误。
代码
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", <the-rest-of-the-connection-string>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "Table") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.load()
错误
Py4JJavaError: An error occurred while calling o1509.save.
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 14
几点考虑
有人能帮我理解一下这个问题吗?
发布于 2022-07-07 06:46:21
遵循以下步骤
配置Azure存储帐户
spark.conf.set(fs.azure.account.key.<your_storage_account>.blob.core.windows.net, “<your_storage_account_access_key>”)
Azure Synapse配置
Database = <Database_Name>
Server = <Server_Name>
User = <Database_Username>
Pass = <Database_Password>
JdbcPort = "1433"
JdbcExtraOptions = "encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sqlUrl = f"jdbc:sqlserver://{Server}:{JdbcPort};database={Database};user={User};password={Pass};${JdbcExtraOptions}"
Azure数据湖Gen 2
tempDir = "abfss://<container>@<your_storage_account_name>.dfs.core.windows.net/<folder>"
天青同步表
tableName = <your_sql_table>
从Azure Synapse读取数据
df = spark.read \
.format("com.databricks.spark.sqldw") \
.option("url", sqlUrl) \
.option("tempDir", tempDir) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", tableName) \
.load()
参考:
https://stackoverflow.com/questions/72873898
复制相似问题