向量数据库查询指定 CollectionView

接口定义
describe_collection_view() 接口用于查询指定 CollectionView 的信息。
def describe_collection_view(
    collection_view_name: str,
    timeout: float | None = None
) -> CollectionView
使用示例
# 指定 AI 类数据库
db = client.database('db-test-ai')
# 查询 Ai 类数据库下的集合
res = db.describe_collection_view(collection_view_name ='coll-ai-files')
print(vars(res)) 
入参描述
参数名
是否必选
参数含义
配置方法
collection_view_name
是
指定所需查询的 Collection 名称。
CollectionView 命名要求如下：
只能使用英文字母，数字，下划线_、中划线-，并以英文字母开头。
长度要求：[1,128]。
timeout
否
请求超时时长。
单位：秒。
默认值：VectorDBClient() 接口配置的 timeout 时长。
取值范围：大于等于0。
返回消息
{
  "database": "db-test-ai",
  "collectionView": "coll-ai-files",
  "description": "this is a collection description",
  "embedding": {
    "language": "zh",
    "enableWordsEmbedding": false
  },
  "splitterPreprocess": {
    "appendTitleToChunk": true,
    "appendKeywordsToChunk": true
  },
  "parsingProcess": {
      "parsingType": "AlgorithmParsing"
  },
  "indexes": [
    {
      "fieldName": "author",
      "fieldType": "string",
      "indexType": "filter"
    },
    {
      "fieldName": "documentSetId",
      "fieldType": "string",
      "indexType": "primaryKey"
    },
    {
      "fieldName": "documentSetName",
      "fieldType": "string",
      "indexType": "filter"
    },
    {
      "fieldName": "tags",
      "fieldType": "array",
      "indexType": "filter"
    }
  ],
  "createTime": "2023-11-27 17:16:54",
  "stats": {
    "indexedDocumentSets": 0,
    "totalDocumentSets": 0,
    "unIndexedDocumentSets": 0
  },
  "alias": [
    "alias-coll-ai-files"
  ]
}
参数
参数含义
子参数
参数含义
database
-
-
显示 CollectionView 所在的 AI 类 Database 名称。
collectionView
-
-
显示 CollectionView 的名称。
embedding
﻿
language
指定文件的语言类型，取值如下所示：
zh：中文。
en：英文。
mutil：多语言。
﻿
﻿
enableWordsEmbedding
配置在检索时，是否开启词（Words）向量精排，并进行词向量化。
true：开启。
false：不开启，默认为 false。
alias
-
-
CollectionView 的所有别名。
createTime
-
-
显示 CollectionView 的创建时间。
description
-
-
显示 CollectionView 的描述信息。

stats

文件处理的状态

indexedDocumentSets

已处理完成的文件的数量。
﻿
﻿


totalDocumentSets


所有的文件的数量。
﻿
﻿

unIndexedDocumentSets

未处理的文件数量。

splitterPreprocess

文件预处理策略

appendTitleToChunk

在对文件拆分时，配置是否将 Title 追加到切分后的段落后面一并 Embedding。取值如下所示：
false：不追加。
true：将段落 Title 追加到切分后的段落。
﻿
﻿

appendKeywordsToChunk

在对文件拆分时，配置是否将关键字 keywords 追加到切分后的段落一并 Embedding。取值如下所示：
false：不追加。
true：将全文的 keywords 追加到切分后的段落。
﻿
﻿
chunk_splitter
以正则表达式的方式配置文档拆分方式，如下：\\n{2,} 代表以两个及以上的换行进行拆分，常用在 QA 对文件拆分中。
parsingProcess
指定 PDF 类型文件的解析方式
parsingType
取值如下所示：
VisionModelParsing：文件依据解析模型解析，推荐使用，可解析 PDF 中双栏、表格等复杂格式。
AlgorithmParsing：文件依据算法解析，系统默认解析方式。Markdown、Word、PPT 类型，无需配置该参数，默认使用 AlgorithmParsing 解析。

Indexes

默认以 documentSetId 文件ID 创建主键索引
fieldName
标识索引对象为 documentSetId。
﻿
﻿
filedType
显示该索引对象的数据类型，固定为 string。
﻿
﻿
indexType
该参数固定显示为 primaryKey。
﻿
默认以 documentSetName 文件名创建 Filter 索引
fieldName
标识索引对象为文件名，固定为 documentSetName。
﻿
﻿
filedType
显示索引对象为文件名的数据类型，固定为 string。
﻿
﻿
indexType
显示索引对象为文件名的索引类型，固定为 filter。在后续检索数据时，才能对该字段设置 Filter 条件表达式检索文件。
﻿
其他自定义需建立 Filter 索引的标量字段
fieldName
自定义扩展字段，例如：author、tags。
﻿
﻿
filedType
显示自定义字段的数据类型。
﻿
﻿
indexType
显示自定义字段索引类别为filter。

参数	参数含义	子参数	参数含义
database	-	-	显示 CollectionView 所在的 AI 类 Database 名称。
collectionView	-	-	显示 CollectionView 的名称。
embedding		language	指定文件的语言类型，取值如下所示： zh：中文。 en：英文。 mutil：多语言。
embedding				enableWordsEmbedding	配置在检索时，是否开启词（Words）向量精排，并进行词向量化。 true：开启。 false：不开启，默认为 false。
alias	-	-	CollectionView 的所有别名。
createTime	-	-	显示 CollectionView 的创建时间。
description	-	-	显示 CollectionView 的描述信息。
stats	文件处理的状态	indexedDocumentSets	已处理完成的文件的数量。
				totalDocumentSets	所有的文件的数量。
				unIndexedDocumentSets	未处理的文件数量。
splitterPreprocess	文件预处理策略	appendTitleToChunk	在对文件拆分时，配置是否将 Title 追加到切分后的段落后面一并 Embedding。取值如下所示： false：不追加。 true：将段落 Title 追加到切分后的段落。
				appendKeywordsToChunk	在对文件拆分时，配置是否将关键字 keywords 追加到切分后的段落一并 Embedding。取值如下所示： false：不追加。 true：将全文的 keywords 追加到切分后的段落。
				chunk_splitter	以正则表达式的方式配置文档拆分方式，如下：`\\n{2,}` 代表以两个及以上的换行进行拆分，常用在 QA 对文件拆分中。
parsingProcess	指定 PDF 类型文件的解析方式	parsingType	取值如下所示： VisionModelParsing：文件依据解析模型解析，推荐使用，可解析 PDF 中双栏、表格等复杂格式。 AlgorithmParsing：文件依据算法解析，系统默认解析方式。Markdown、Word、PPT 类型，无需配置该参数，默认使用 AlgorithmParsing 解析。
Indexes	默认以 documentSetId 文件ID 创建主键索引	fieldName	标识索引对象为 `documentSetId`。
				filedType	显示该索引对象的数据类型，固定为 `string`。
				indexType	该参数固定显示为 `primaryKey`。
		默认以 documentSetName 文件名创建 Filter 索引	fieldName	标识索引对象为文件名，固定为 `documentSetName`。
				filedType	显示索引对象为文件名的数据类型，固定为 `string`。
				indexType	显示索引对象为文件名的索引类型，固定为 `filter`。在后续检索数据时，才能对该字段设置 Filter 条件表达式检索文件。
		其他自定义需建立 Filter 索引的标量字段	fieldName	自定义扩展字段，例如：author、tags。
				filedType	显示自定义字段的数据类型。
				indexType	显示自定义字段索引类别为`filter`。

参数名	是否必选	参数含义	配置方法
collection_view_name	是	指定所需查询的 Collection 名称。	CollectionView 命名要求如下：只能使用英文字母，数字，下划线_、中划线-，并以英文字母开头。长度要求：[1,128]。
timeout	否	请求超时时长。	单位：秒。默认值：VectorDBClient() 接口配置的 timeout 时长。取值范围：大于等于0。

查询指定 CollectionView

本页目录：

接口定义

使用示例

入参描述

返回消息