videx是字节最近开源的MySQL虚拟索引工具。项目地址 https://github.com/bytedance/videx
具体的可以看官方文档。
这里只列出实操演示过程。
系统版本: centos7
IP: 192.168.31.181
docker pull kangrongme/videx:latest
docker run -d -p 13308:13308 -p 5001:5001 --name videx kangrongme/videx:latest
# 克隆代码
cd /root/
VIDEX_HOME=videx_server
git clone https://github.com/bytedance/videx.git $VIDEX_HOME
cd $VIDEX_HOME
# 创建python虚拟环境
conda create -n videx_py39 python=3.9
conda activate videx_py39
# 安装扩展
python3.9 -m pip install -e . --use-pep517
cd $VIDEX_HOME
cd data/tpch_tiny
tar xf tpch_tiny.sql.tar.gz
mysql -h192.168.31.181 -P3306 -udts -p123456 -e "create database tpch_tiny;"
mysql -h192.168.31.181 -P3306 -udts -p123456 -Dtpch_tiny < tpch_tiny.sql
cd $VIDEX_HOME
-- 这里的videx用的是docker镜像,导入到videx_tpch_tiny这个库里(会自动创建),默认的账号密码是videx和password
python src/sub_platforms/sql_opt/videx/scripts/videx_build_env.py --target 192.168.31.181:3306:tpch_tiny:dts:123456 --videx 192.168.31.181:13308:videx_tpch_tiny:videx:password
日志类似如下:
/usr/local/miniconda3/envs/videx/lib/python3.11/site-packages/pydantic/_internal/_fields.py:192: UserWarning: Field name "schema" in "MySQLConnectionConfig" shadows an attribute in parent "BaseModel"
warnings.warn(
logging config: {'version': 1, 'formatters': {'default': {'format': '%(asctime)s [%(process)d:%(thread)d] %(levelname)-8s %(name)-15s [%(filename)s:%(lineno)d] %(_videx_trace_id)s %(message)s', 'datefmt': '%Y-%m-%d %H:%M:%S'}}, 'filters': {'videx_trace_filter': {'()': <class 'sub_platforms.sql_opt.videx.videx_logging.VidexTraceIdFilter'>}}, 'handlers': {'console': {'class': 'logging.StreamHandler', 'formatter': 'default', 'level': 'INFO', 'stream': 'ext://sys.stdout', 'filters': ['videx_trace_filter']}, 'info_log_file': {'class': 'logging.handlers.RotatingFileHandler', 'formatter': 'default', 'filename': './log/videx_app_info.log', 'maxBytes': 100000000, 'backupCount': 5, 'filters': ['videx_trace_filter']}, 'error_log_file': {'class': 'logging.handlers.RotatingFileHandler', 'formatter': 'default', 'filename': './log/videx_app_error.log', 'maxBytes': 100000000, 'backupCount': 5, 'filters': ['videx_trace_filter']}}, 'root': {'level': 'NOTSET', 'handlers': ['console', 'info_log_file'], 'propagate': False}, 'loggers': {'error_logger': {'level': 'ERROR', 'handlers': ['error_log_file'], 'propagate': True}}}
2025-05-08 14:05:02 [55657:140020204398400] INFO root [videx_build_env.py:145] - metadata file is videx_metadata_tpch_tiny.json
2025-05-08 14:05:02 [55657:140020204398400] INFO root [videx_metadata.py:594] - fetch_all_meta_for_videx. target_db='tpch_tiny' result_dir='temp_meta_1746684302' n_buckets=16 hist_force=True hist_mem_size=200000000 all_table_names=None
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:617] - fetch meta for videx: ndv_single_file = 'videx_tpch_tiny_ndv_single.json' not found in result_dir = 'temp_meta_1746684302', or exist ndv single is not enough.fetch it: sorted(ndv_single_dict.keys()) = [] miss_ndv_tables=['customer', 'lineitem', 'nation', 'orders', 'part', 'partsupp', 'region', 'sbtest1', 'sbtest2', 'supplier']
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_ACCTBAL [0/8]: SELECT COUNT(DISTINCT C_ACCTBAL) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_ADDRESS [1/8]: SELECT COUNT(DISTINCT C_ADDRESS) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_COMMENT [2/8]: SELECT COUNT(DISTINCT C_COMMENT) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_CUSTKEY [3/8]: SELECT COUNT(DISTINCT C_CUSTKEY) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_MKTSEGMENT [4/8]: SELECT COUNT(DISTINCT C_MKTSEGMENT) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERKEY [4/9]: SELECT COUNT(DISTINCT O_ORDERKEY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERPRIORITY [5/9]: SELECT COUNT(DISTINCT O_ORDERPRIORITY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERSTATUS [6/9]: SELECT COUNT(DISTINCT O_ORDERSTATUS) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_NAME [4/9]: SELECT COUNT(DISTINCT P_NAME) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_SUPPKEY [6/7]: SELECT COUNT(DISTINCT S_SUPPKEY) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:639] - fetch meta for videx: hist_file='videx_tpch_tiny_histogram.json' not found in result_dir='temp_meta_1746684302', or exist hist is not enough.fetch it. sorted(hist_dict.keys())=[] miss_hist_tables=['customer', 'lineitem', 'nation', 'orders', 'part', 'partsupp', 'region', 'sbtest1', 'sbtest2', 'supplier']
2025-05-08 14:05:05 [55657:140020204398400] WARNING root [db_variable.py:79] - version not updated, return empty str
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_ACCTBAL` with 16 n_buckets
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_ADDRESS` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_COMMENT` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_CUSTKEY` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:765] - Column is covered single uk, force generate: dbname='tpch_tiny', table_name='customer', col_name='C_CUSTKEY'
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_MKTSEGMENT` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_NAME` with 16 n_buckets
/usr/local/miniconda3/envs/videx/lib/python3.11/site-packages/pydantic/_internal/_fields.py:192: UserWarning: Field name "schema" in "MySQLConnectionConfig" shadows an attribute in parent "BaseModel"
warnings.warn(
logging config: {'version': 1, 'formatters': {'default': {'format': '%(asctime)s [%(process)d:%(thread)d] %(levelname)-8s %(name)-15s [%(filename)s:%(lineno)d] %(_videx_trace_id)s %(message)s', 'datefmt': '%Y-%m-%d %H:%M:%S'}}, 'filters': {'videx_trace_filter': {'()': <class 'sub_platforms.sql_opt.videx.videx_logging.VidexTraceIdFilter'>}}, 'handlers': {'console': {'class': 'logging.StreamHandler', 'formatter': 'default', 'level': 'INFO', 'stream': 'ext://sys.stdout', 'filters': ['videx_trace_filter']}, 'info_log_file': {'class': 'logging.handlers.RotatingFileHandler', 'formatter': 'default', 'filename': './log/videx_app_info.log', 'maxBytes': 100000000, 'backupCount': 5, 'filters': ['videx_trace_filter']}, 'error_log_file': {'class': 'logging.handlers.RotatingFileHandler', 'formatter': 'default', 'filename': './log/videx_app_error.log', 'maxBytes': 100000000, 'backupCount': 5, 'filters': ['videx_trace_filter']}}, 'root': {'level': 'NOTSET', 'handlers': ['console', 'info_log_file'], 'propagate': False}, 'loggers': {'error_logger': {'level': 'ERROR', 'handlers': ['error_log_file'], 'propagate': True}}}
2025-05-08 14:05:02 [55657:140020204398400] INFO root [videx_build_env.py:145] - metadata file is videx_metadata_tpch_tiny.json
2025-05-08 14:05:02 [55657:140020204398400] INFO root [videx_metadata.py:594] - fetch_all_meta_for_videx. target_db='tpch_tiny' result_dir='temp_meta_1746684302' n_buckets=16 hist_force=True hist_mem_size=200000000 all_table_names=None
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:617] - fetch meta for videx: ndv_single_file = 'videx_tpch_tiny_ndv_single.json' not found in result_dir = 'temp_meta_1746684302', or exist ndv single is not enough.fetch it: sorted(ndv_single_dict.keys()) = [] miss_ndv_tables=['customer', 'lineitem', 'nation', 'orders', 'part', 'partsupp', 'region', 'sbtest1', 'sbtest2', 'supplier']
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_ACCTBAL [0/8]: SELECT COUNT(DISTINCT C_ACCTBAL) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_ADDRESS [1/8]: SELECT COUNT(DISTINCT C_ADDRESS) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_COMMENT [2/8]: SELECT COUNT(DISTINCT C_COMMENT) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_CUSTKEY [3/8]: SELECT COUNT(DISTINCT C_CUSTKEY) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_MKTSEGMENT [4/8]: SELECT COUNT(DISTINCT C_MKTSEGMENT) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_NAME [5/8]: SELECT COUNT(DISTINCT C_NAME) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_NATIONKEY [6/8]: SELECT COUNT(DISTINCT C_NATIONKEY) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for customer.tpch_tiny.customer.C_PHONE [7/8]: SELECT COUNT(DISTINCT C_PHONE) FROM `tpch_tiny`.`customer`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_COMMENT [0/17]: SELECT COUNT(DISTINCT L_COMMENT) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_COMMITDATE [1/17]: SELECT COUNT(DISTINCT L_COMMITDATE) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_DISCOUNT [2/17]: SELECT COUNT(DISTINCT L_DISCOUNT) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_EXTENDEDPRICE [3/17]: SELECT COUNT(DISTINCT L_EXTENDEDPRICE) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_ID [4/17]: SELECT COUNT(DISTINCT L_ID) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_LINENUMBER [5/17]: SELECT COUNT(DISTINCT L_LINENUMBER) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_LINESTATUS [6/17]: SELECT COUNT(DISTINCT L_LINESTATUS) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_ORDERKEY [7/17]: SELECT COUNT(DISTINCT L_ORDERKEY) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_PARTKEY [8/17]: SELECT COUNT(DISTINCT L_PARTKEY) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_QUANTITY [9/17]: SELECT COUNT(DISTINCT L_QUANTITY) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_RECEIPTDATE [10/17]: SELECT COUNT(DISTINCT L_RECEIPTDATE) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_RETURNFLAG [11/17]: SELECT COUNT(DISTINCT L_RETURNFLAG) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_SHIPDATE [12/17]: SELECT COUNT(DISTINCT L_SHIPDATE) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:04 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_SHIPINSTRUCT [13/17]: SELECT COUNT(DISTINCT L_SHIPINSTRUCT) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_SHIPMODE [14/17]: SELECT COUNT(DISTINCT L_SHIPMODE) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_SUPPKEY [15/17]: SELECT COUNT(DISTINCT L_SUPPKEY) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for lineitem.tpch_tiny.lineitem.L_TAX [16/17]: SELECT COUNT(DISTINCT L_TAX) FROM `tpch_tiny`.`lineitem`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for nation.tpch_tiny.nation.N_COMMENT [0/4]: SELECT COUNT(DISTINCT N_COMMENT) FROM `tpch_tiny`.`nation`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for nation.tpch_tiny.nation.N_NAME [1/4]: SELECT COUNT(DISTINCT N_NAME) FROM `tpch_tiny`.`nation`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for nation.tpch_tiny.nation.N_NATIONKEY [2/4]: SELECT COUNT(DISTINCT N_NATIONKEY) FROM `tpch_tiny`.`nation`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for nation.tpch_tiny.nation.N_REGIONKEY [3/4]: SELECT COUNT(DISTINCT N_REGIONKEY) FROM `tpch_tiny`.`nation`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_CLERK [0/9]: SELECT COUNT(DISTINCT O_CLERK) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_COMMENT [1/9]: SELECT COUNT(DISTINCT O_COMMENT) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_CUSTKEY [2/9]: SELECT COUNT(DISTINCT O_CUSTKEY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERDATE [3/9]: SELECT COUNT(DISTINCT O_ORDERDATE) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERKEY [4/9]: SELECT COUNT(DISTINCT O_ORDERKEY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERPRIORITY [5/9]: SELECT COUNT(DISTINCT O_ORDERPRIORITY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_ORDERSTATUS [6/9]: SELECT COUNT(DISTINCT O_ORDERSTATUS) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_SHIPPRIORITY [7/9]: SELECT COUNT(DISTINCT O_SHIPPRIORITY) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for orders.tpch_tiny.orders.O_TOTALPRICE [8/9]: SELECT COUNT(DISTINCT O_TOTALPRICE) FROM `tpch_tiny`.`orders`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_BRAND [0/9]: SELECT COUNT(DISTINCT P_BRAND) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_COMMENT [1/9]: SELECT COUNT(DISTINCT P_COMMENT) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_CONTAINER [2/9]: SELECT COUNT(DISTINCT P_CONTAINER) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_MFGR [3/9]: SELECT COUNT(DISTINCT P_MFGR) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_NAME [4/9]: SELECT COUNT(DISTINCT P_NAME) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_PARTKEY [5/9]: SELECT COUNT(DISTINCT P_PARTKEY) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_RETAILPRICE [6/9]: SELECT COUNT(DISTINCT P_RETAILPRICE) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_SIZE [7/9]: SELECT COUNT(DISTINCT P_SIZE) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for part.tpch_tiny.part.P_TYPE [8/9]: SELECT COUNT(DISTINCT P_TYPE) FROM `tpch_tiny`.`part`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_AVAILQTY [0/6]: SELECT COUNT(DISTINCT PS_AVAILQTY) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_COMMENT [1/6]: SELECT COUNT(DISTINCT PS_COMMENT) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_ID [2/6]: SELECT COUNT(DISTINCT PS_ID) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_PARTKEY [3/6]: SELECT COUNT(DISTINCT PS_PARTKEY) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_SUPPKEY [4/6]: SELECT COUNT(DISTINCT PS_SUPPKEY) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for partsupp.tpch_tiny.partsupp.PS_SUPPLYCOST [5/6]: SELECT COUNT(DISTINCT PS_SUPPLYCOST) FROM `tpch_tiny`.`partsupp`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for region.tpch_tiny.region.R_COMMENT [0/3]: SELECT COUNT(DISTINCT R_COMMENT) FROM `tpch_tiny`.`region`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for region.tpch_tiny.region.R_NAME [1/3]: SELECT COUNT(DISTINCT R_NAME) FROM `tpch_tiny`.`region`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for region.tpch_tiny.region.R_REGIONKEY [2/3]: SELECT COUNT(DISTINCT R_REGIONKEY) FROM `tpch_tiny`.`region`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest1.tpch_tiny.sbtest1.c [0/4]: SELECT COUNT(DISTINCT c) FROM `tpch_tiny`.`sbtest1`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest1.tpch_tiny.sbtest1.id [1/4]: SELECT COUNT(DISTINCT id) FROM `tpch_tiny`.`sbtest1`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest1.tpch_tiny.sbtest1.k [2/4]: SELECT COUNT(DISTINCT k) FROM `tpch_tiny`.`sbtest1`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest1.tpch_tiny.sbtest1.pad [3/4]: SELECT COUNT(DISTINCT pad) FROM `tpch_tiny`.`sbtest1`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest2.tpch_tiny.sbtest2.aaa [0/5]: SELECT COUNT(DISTINCT aaa) FROM `tpch_tiny`.`sbtest2`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest2.tpch_tiny.sbtest2.c [1/5]: SELECT COUNT(DISTINCT c) FROM `tpch_tiny`.`sbtest2`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest2.tpch_tiny.sbtest2.id [2/5]: SELECT COUNT(DISTINCT id) FROM `tpch_tiny`.`sbtest2`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest2.tpch_tiny.sbtest2.k [3/5]: SELECT COUNT(DISTINCT k) FROM `tpch_tiny`.`sbtest2`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for sbtest2.tpch_tiny.sbtest2.pad [4/5]: SELECT COUNT(DISTINCT pad) FROM `tpch_tiny`.`sbtest2`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_ACCTBAL [0/7]: SELECT COUNT(DISTINCT S_ACCTBAL) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_ADDRESS [1/7]: SELECT COUNT(DISTINCT S_ADDRESS) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_COMMENT [2/7]: SELECT COUNT(DISTINCT S_COMMENT) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_NAME [3/7]: SELECT COUNT(DISTINCT S_NAME) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_NATIONKEY [4/7]: SELECT COUNT(DISTINCT S_NATIONKEY) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_PHONE [5/7]: SELECT COUNT(DISTINCT S_PHONE) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:406] - Fetch NDV for supplier.tpch_tiny.supplier.S_SUPPKEY [6/7]: SELECT COUNT(DISTINCT S_SUPPKEY) FROM `tpch_tiny`.`supplier`;
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_metadata.py:639] - fetch meta for videx: hist_file='videx_tpch_tiny_histogram.json' not found in result_dir='temp_meta_1746684302', or exist hist is not enough.fetch it. sorted(hist_dict.keys())=[] miss_hist_tables=['customer', 'lineitem', 'nation', 'orders', 'part', 'partsupp', 'region', 'sbtest1', 'sbtest2', 'supplier']
2025-05-08 14:05:05 [55657:140020204398400] WARNING root [db_variable.py:79] - version not updated, return empty str
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_ACCTBAL` with 16 n_buckets
2025-05-08 14:05:05 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_ADDRESS` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_COMMENT` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_CUSTKEY` with 16 n_buckets
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:765] - Column is covered single uk, force generate: dbname='tpch_tiny', table_name='customer', col_name='C_CUSTKEY'
2025-05-08 14:05:06 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`customer`.`C_MKTSEGMENT` with 16 n_buckets
2025-05-08 14:05:23 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_EXTENDEDPRICE` with 16 n_buckets
2025-05-08 14:05:28 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_ID` with 16 n_buckets
2025-05-08 14:05:28 [55657:140020204398400] INFO root [videx_histogram.py:765] - Column is covered single uk, force generate: dbname='tpch_tiny', table_name='lineitem', col_name='L_ID'
2025-05-08 14:05:55 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_RECEIPTDATE` with 16 n_buckets
2025-05-08 14:06:00 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_RETURNFLAG` with 16 n_buckets
2025-05-08 14:06:05 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_SHIPDATE` with 16 n_buckets
2025-05-08 14:06:11 [55657:140020204398400] INFO root [videx_histogram.py:813] - Generating Histogram for `tpch_tiny`.`lineitem`.`L_SHIPINSTRUCT` with 16 n_buckets
在执行上面的python命令的时候,我提前开启了mysql的general_log日志,发现它底层执行了这些命令(这里只摘录了部分命令):
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, ENGINE,
VERSION, ROW_FORMAT, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH,
MAX_DATA_LENGTH, INDEX_LENGTH, DATA_FREE, AUTO_INCREMENT,
CREATE_TIME, UPDATE_TIME, CHECK_TIME, TABLE_COLLATION,
CHECKSUM, CREATE_OPTIONS, TABLE_COMMENT
FROM information_schema.TABLES
WHERE table_schema = 'tpch_tiny' and ENGINE = 'InnoDB';
show table status in `tpch_tiny` like 'customer';
select table_schema, table_name, column_name, ordinal_position, is_nullable,
data_type, character_maximum_length, character_octet_length, numeric_precision,
numeric_scale, datetime_precision, character_set_name, collation_name,
column_type, column_key, extra
from information_schema.columns
where table_schema='tpch_tiny' and table_name='customer';
select table_schema as dbname, table_name as table_name, index_name as index_name,
non_unique as non_unique, seq_in_index as seq_in_index,
column_name as column_name, cardinality as cardinality,
sub_part as sub_part, is_visible as is_visible,
expression as expression, collation as collation, index_type as index_type
from information_schema.statistics
where table_schema = 'tpch_tiny' and table_name='customer';
select TABLE_NAME, N_ROWS,CLUSTERED_INDEX_SIZE, SUM_OF_OTHER_INDEX_SIZES from `mysql`.`innodb_table_stats` where database_name='tpch_tiny';
select n_rows, clustered_index_size, sum_of_other_index_sizes from mysql.innodb_table_stats where database_name='tpch_tiny' and table_name='customer';
show create table `tpch_tiny`.`customer`;
SELECT COUNT(DISTINCT P_COMMENT) FROM `tpch_tiny`.`part`;
-- 直方图信息
ANALYZE TABLE `tpch_tiny`.`customer` UPDATE HISTOGRAM ON C_ACCTBAL WITH 16 BUCKETS;
ANALYZE TABLE `tpch_tiny`.`customer` DROP HISTOGRAM ON C_ACCTBAL;
-- 数据抽样
SELECT MIN(C_CUSTKEY) as min, MAX(C_CUSTKEY) as max FROM tpch_tiny.customer;
SELECT COUNT(1) FROM tpch_tiny.customer WHERE C_CUSTKEY IS NULL;
SELECT COUNT(1) FROM tpch_tiny.customer;
SELECT COUNT(1) as bucket_count, COUNT(DISTINCT C_CUSTKEY) as bucket_ndv,
MIN(C_CUSTKEY) as actual_min, MAX(C_CUSTKEY) as actual_max
FROM tpch_tiny.customer
WHERE C_CUSTKEY >= 9390 AND C_CUSTKEY < 18755;
登录到docker部署的videx-server的mysql中,查看执行计划:
mysql -h192.168.31.181 -P13308 -uvidex -ppassword -Dvidex_tpch_tiny
-- SET @VIDEX_SERVER='127.0.0.1:5001'; -- Not needed for Docker deployment
-- Connect VIDEX-Optimizer: mysql -h127.0.0.1 -P13308 -uvidex -ppassword -Dvidex_tpch_tiny
USE videx_tpch_tiny;
EXPLAIN
FORMAT = JSON
SELECT s_name, count(*) AS numwait
FROM supplier,
lineitem l1,
orders,
nation
WHERE s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (SELECT *
FROM lineitem l2
WHERE l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXISTS (SELECT *
FROM lineitem l3
WHERE l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = 'IRAQ'
GROUP BY s_name
ORDER BY numwait DESC, s_name;
然后,再把上面的这个explain的sql放到真实的3306的mysql里面执行下,对比二者的explain开销情况,可以看到差别并不大。
此外,可以再做如下的实验:
1、采集完数据后,在 13308这个Videx数据库实例里,执行一个加索引后查看执行计划的操作:
> mysql -h127.0.0.1 -P13308 -uvidex -ppassword -Dvidex_tpch_tiny
[videx_tpch_tiny] > alter table lineitem add index idx_2222(L_LINENUMBER);
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
[videx_tpch_tiny] > explain select * from lineitem where L_LINENUMBER=3 ;
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | lineitem | NULL | ref | idx_2222 | idx_2222 | 4 | const | 10724 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
1 row in set, 1 warning (0.02 sec)
[videx_tpch_tiny] > select * from lineitem where L_LINENUMBER=3 ; -- videx不支持真实的查询操作
ERROR 1031 (HY000): Table storage engine for 'lineitem' doesn't have this option
2、在真实MySQL上执行上述操作
[tpch_tiny] > explain select * from lineitem where L_LINENUMBER=3 ;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NULL | NULL | 59231 | 10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
[tpch_tiny] > alter table lineitem add index idx_2222(L_LINENUMBER);
Query OK, 0 rows affected (0.26 sec)
Records: 0 Duplicates: 0 Warnings: 0
[tpch_tiny] > explain select * from lineitem where L_LINENUMBER=3 ;
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | lineitem | NULL | ref | idx_2222 | idx_2222 | 4 | const | 10795 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
1 row in set, 1 warning (0.00 sec)
3、可以看到videx的仿真效果还是不错的。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。