Loading [MathJax]/jax/output/CommonHTML/config.js
前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >A Machine Learning-Based Approximation of Strong Branching

A Machine Learning-Based Approximation of Strong Branching

作者头像
短短的路走走停停
发布于 2021-08-12 03:45:13
发布于 2021-08-12 03:45:13
1.2K0
举报
文章被收录于专栏:程序猿声程序猿声

论文阅读笔记,个人理解,如有错误请指正,感激不尽!该文分类到Machine learning alongside optimization algorithms。

01 Introduction

目前大部分 mixed-integer programming (MIP) solvers 都是基于 branch-and-bound (B&B) 算法。近几年很多新的特性如cutting planes, presolve, heuristics, and advanced branching strategie等都被添加到了MIP solvers上以提高求解的效率。

branching strategie对MIP的求解效果有着很大的影响,也有很多文献对其进行了研究。

  • 最简单的branching strategie为most-infeasible branching,即每次选择小数部分最接近0.5的变量进行分支。
  • 还有pseudocost branching,该策略首先对分支过程中的dual bound increases进行一个记录,然后在分支的时候利用该信息对候选变量的dual bound increases进行一个评估,这种方法有个缺点就是一开始的时候没有足够的信息可供使用。
  • strong branching则克服了这一缺点,它通过计算每个fractional variable分支后的linear programming (LP) relaxations,从而显式地评估dual bound increases,然后最大的 increases就作为当前分支的节点。它的效果是显而易见的,但是,分支节点过多,每次求解LP relaxations需要花费过多的时间,导致了strong branching的求解效率过低。
  • 当然还有很多分支策略,例如reliability branching, inference branchingnonchimerical branching等,感兴趣的可以去读读文献。

这篇文章的contribution在于,利用机器学习的方法,对strong branching进行了学习,并模型集成到B&B算法的框架中,以加速算法的求解速度。

这篇文章处理的二进制MILP问题有如下的形式:

其中

c \in \mathbb{R}^{m\times n}, b \in \mathbb{R}^m

,分别表示成本系数和系数矩阵。在右边,

I

C

分别为整数变量和实数变量的下标集合。

02 Learning Branching Decisions

本文并不是提出一个新的branching heuristic,而是使用machine learning通过学习模仿其他分支策略生成的决策去构建一个branching strategy。本文的目标是创造一个效率更高的strong branching的approximation。

2.1 Data Set Generation

机器学习首先是收集数据对

(\phi_i, y_i)

,其中

\phi_i

为当前分支节点中待分支变量的特征值,而

y_i

则是该特征对应的标签。对模型进行训练。收集数据可以使用strong branching对training problems进行求解,并将求解的中间过程记录下来。具体就是每个节点分支变量的特征以及标签值,这些数据最终作为机器学习算法的输入而对模型进行训练。

2.2 Machine Learning Algorithm

在这篇文章中,使用的算法为Extremely Randomized Trees或者ExtraTreesExtraTrees是从随机森林直接修改过来的,之所以选择ExtraTrees是因为它对于参数的取值具有较强的robust性。

03 Features Describing the Current Subproblem

特征对于机器学习算法来说是非常重要的,他们对方法的有效性起着关键作用。一方面,特征需要足够完整和精确以更加准确地描述子问题。而另一方面,他们又需要在计算中足够高效。这两方面是需要权衡的,因为有一些特征对方法的效果起着非常显著的作用,但是计算的代价又非常大。比如,在分支过程中,对某支进行分支时LP目标值的提升值,就是一个非常好的特征,也在strong branching中使用了。但是计算这个值需要消耗的代价还是太大了,因此不适合该文的算法。

至于要选择哪些特征,又是处于什么样的考虑,作者也说了很多。感兴趣的可以看看论文。下面是详细的特征介绍,比较重要,因此直接把原文搬过来了。

3.1 Static Problem Features

  • The first set of features are computed from the sole parameters
c, A

and

b

. They are calculated once and for all, and they represent the static state of the problem. Their goal is to give an overall description of the problem.

  • Besides the sign of the element
c_i

, we also use

|c_i| / \sum_{j:c_j \ge 0}|c_j|

and

|c_i| / \sum_{j:c_j < 0}|c_j|

. Distinguishing both is important, because the sign of the coefficient in the cost function is of utmost importance for evaluating the impact of a variable on the objective value.

  • The second class of static features is meant to represent the influence of the coefficients of variable
i

in the coefficient matrix

A

. We develop three measures—namely,

M^1_j, M^2_j

, and

M^3_j

— that describe variable

i

within the problem in terms of the constraint

j

. Once the values of the measure

M_j

are computed, the corresponding features added to the feature vector

i

are given by

min_j M_j

and

max_j M_j

. The rationale behind this choice is that, when it comes to describing the constraints of a given problem, only the extreme values are relevant.

  • The first measure
M^1_j

is composed of two parts:

M^{1+}_j

computed by

A_{ji}/|b_j|, ∀j

such that

b_j ≥ 0

, and

M^{1−}_j

computed by

A_{ji}/|b_j|, ∀j

such that

b_j <0

. The minimum and maximum values of

M^{1+}_j

and

M^{1−}_j

are used as features, to indicate by how much a variable contributes to the constraint violations.

  • Measure
M^2_j

models the relationship between the cost of a variable and the coefficients of the same variable in the constraints. Similarly to the first measure,

M^2_j

is split in

M^{2+}_j = |c_i|/A_{ji}, ∀j

with

c_i ≥ 0

, and

M^{2−}_j =|c_i|/A_{ji}, ∀j

with

c_i < 0

. As for the previous measure, the feature vector

\phi_i

contains both the minimum and the maximum values of

M{2+}_j

and

M^{2−}_j

.

  • Finally, the third measure
M^3_j

represents the inter-variable relationships within the constraints. The measure is split into

M^{3+}_j = |A_{ji}|/ \sum_{k:A_{jk} \ge 0}|A_{jk}|

and

M^{3−}_j = |A_{ji}|/ \sum_{k:A_{jk} < 0}|A_{jk}|

.

M^{3+}_j

is in turn divided in

M^{3++}_j

and

M^{3+−}_j

that are calculated using the formula of

M^{3+}_j

for

A_{ji} ≥ 0

and

A_{ji} < 0

, respectively. The same splitting is performed for

M^{3−}_j

. Again, the minimum and maximum of the four

M^3_j

computed for all constraints are added to the features.

3.2 Dynamic Problem Features

The second type of features are related to the solution of the problem at the current B&B node. Those features contain the proportion of fixed variables at the current solution, the up and down fractionalities of variable

i

, the up and down Driebeek penalties (Driebeek 1966) corresponding to variable

i

, normalized by the objective value at the current node, and the sensitivity range of the objective function coefficient of variable

i

, also normalized by

|ci|

.

3.3 Dynamic Optimization Features

The last set of features is meant to represent the overall state of the optimization. These features summarize global information that is not available from the single current node. When branching is performed on a variable, the objective increases are stored for that variable. From these numbers, we extract statistics for each variable: the minimum, the maximum, the mean, the standard deviation, and the quartiles of the objective increases. These statistics are used as features to describe the variable for which they were computed.

As those features should be independent of the scale of the problem, we divide each objective increase by the objective value at the current node, such that the computed statistics correspond to the relative objective increase for each variable. Finally, the last feature added to this subset is the number of times variable i has been chosen as branching variable, normalized by the total number of branchings performed.

04 Experiments

  • (steps 1) we generate a data set
D_{sb}

using strong branching,

  • (steps 2) we learn from it a branching heuristic, and
  • (steps 3) we compare the learned heuristic with other branching strategies on various problems.

4.1 Problem Sets

  • two types of problem sets, namely randomly generated problem sets and standard benchmark problems from the MIPLIB.
  • The random problem sets are used for both training (steps 1 and 2) and assessing (step 3) the heuristics. MIPLIB problems are only used for assessment (step 3).

The possible constraints are chosen among set covering (SC), multiknapsack (MKN), bin packing (BP), and equality (EQ) constraints. We generated problems that contain constraints of type BP-EQ, BP-SC, and MKN-SC.

As some of those problems are going to be used to generate the training data set, we randomly split each family into a train and a test set. In the end, we have six data sets: BPEQ_train, BPEQ_test, BPSC_train, BPSC_test, MKNSC_train, and MKNSC_test.

The chosen parameters of ExtraTrees are

N = 100, k = |\phi|

, and

n_{min} = 20

.

4.3 Results

We compare our approach to five other branching strategies, namely random branching (random), most-infeasible branching (MIB), nonchimerical branching (NCB) (Fischetti and Monaci 2012), full strong branching (FSB), and reliability branching (RB) (Achterberg et al. 2005).

In these tables (Tables 4, 5, and 7), Cl. Gap refers to the closed gap and S/T indicates the number of problems solved within the provided nodes or time limit versus the total number of problems. Nodes and Time, respectively, represent the number of explored nodes and the time spent (in seconds) before the optimization either finds the optimal solution or stops earlier because of one stopping criterion.

The normal learned branching strategy (i.e., (

n_{min}=20

, all)) is learned based on a data set containing samples from the three types of training problems (i.e., BPSC, BPEQ, and MKNSC).

  • Those results show that our approach succeeds in efficiently imitating FSB. Indeed, the experiments performed with a limit on the number of nodes show that the closed gap is only 9% smaller, while the time spent is reduced by 85% compared to FSB.
  • The experiments with a time limit show that the reduced time required to take a decision allows the learned strategy to explore more nodes and to thus further close the gap than FSB. While these results are encouraging, they are still slightly worse than the results obtained with RB, which is both closer to FSB and faster than our approach
  • we separated the problems that were solved by all methods from the problems that were not solved by at least one of the compared methods.
  • the results obtained with the learned branching strategy are still a little below the results obtained with reliability branching. The results presented here are averaged over all considered problems.

Finally, Tables 6 and 7 report the results form our last set of experiments.

Additionally, in another experiment, we let CPLEX use cuts and heuristics (with default CPLEX parameters) in the course of the optimization to observe their impact on the efficiency of each branching strategy.

  • Overall, our method compares favorably to its competitors when cuts and heuristics are used by CPLEX. Indeed, in that case, our learned branching strategy is the fastest (almost three times faster than the second fastest method (i.e., MIB)) to solve all 30 considered problems.
  • Note that the apparent bad results of RB are due to three problems that are especially hard for that branching heuristic (air04, air05, and mod011).
  • Things appear to be different when cuts and heuristics are not used. Indeed, based on the results of Table 7, our method seems to be very slow, but the large number of nodes and the large amount of time is actually due to a small number of problems for which the method does not work well.
  • A possible explanation for why our method does not perform well on those problems can be that these problems, because too large, are not well represented in the data set that we use to learn the branching strategy.

Reference

  • [1] Alejandro Marcos Alvarez, Quentin Louveaux, Louis Wehenkel (2017) A Machine Learning-Based Approximation of Strong Branching. INFORMS Journal on Computing 29(1):185-195. http://dx.doi.org/10.1287/ijoc.2016.0723
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-08-07,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 程序猿声 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
Machine-Learning–Based Column Selection for Column Generation
论文阅读笔记,个人理解,如有错误请指正,感激不尽!仅是对文章进行梳理,细节请阅读参考文献。该文分类到Machine learning alongside optimization algorithms。
短短的路走走停停
2021/08/12
1K0
不完全免疫算法简介AU-DHEIA--AIS学习笔记6
多目标优化 An adaptive hybrid MOIA based on uniform distribution selection “参考文献 An adaptive hybrid evolutionary immune multi-objective algorithm based on uniform distribution selection,Information Sciences 512 (2020) 446–470 摘要 In general, for the iteration p
演化计算与人工智能
2020/08/14
3480
Expressivity,Trainability,and Generalization in Machine Learning
When I read Machine Learning papers, I ask myself whether the contributions of the paper fall under improvements to 1) Expressivity 2) Trainability, and/or 3) Generalization. I learned this categorization from my colleague Jascha Sohl-Dickstein at Google B
WZEARW
2018/04/11
1.1K0
Expressivity,Trainability,and Generalization in Machine Learning
开源线性规划求解器(Linear Programming solver)LP_Solve和CLP的PK
前几天老板让测一下一些open source LP solver的稳定性。先看看本次上场的主角:
短短的路走走停停
2021/12/17
7.8K1
开源线性规划求解器(Linear Programming solver)LP_Solve和CLP的PK
掌握branch and cut算法原理附带C++求解TSP问题代码
branch and cut其实还是和branch and bound脱离不了干系的。所以,在开始本节的学习之前,请大家还是要务必掌握branch and bound算法的原理。
短短的路走走停停
2019/08/22
2K0
Introduction_Of_Convex_Optimization
Study notes from Convex Optimization by Stephen Boyd, Lieven Vandenberghe.
hotarugali
2022/03/18
7380
Introduction_Of_Convex_Optimization
谷歌机器学习43条黄金法则(手册版+PDF)
之前的 谷歌机器学习法则:ML工程的最佳实践 将谷歌公司关于机器学习方面的实践经验详细的介绍了下,很多朋友会问有没有手册版以及PDF版本。这里会将精简后的法则内容(中文+英文)一一列举出来,并且将中文+英文版的PDF文件(带书签目录)分享给大家(见文末)。
abs_zero
2018/07/25
6940
谷歌机器学习43条黄金法则(手册版+PDF)
干货 | 10分钟带你全面掌握branch and bound(分支定界)算法-概念篇
之前一直做启发式算法,最近突然对精确算法感兴趣了。但是这玩意儿说实话是真的难,刚好boss又叫我学学column generation求解VRP相关的内容。
短短的路走走停停
2019/07/25
18.9K1
《Understanding Deep Learning》书摘
豆瓣:https://book.douban.com/subject/36395283/
AlphaHinex
2025/01/19
830
《Understanding Deep Learning》书摘
人工智能学术速递[7.5]
cs.AI人工智能,共计39篇 【1】 How Incomplete is Contrastive Learning? AnInter-intra Variant Dual Representati
公众号-arXiv每日学术速递
2021/07/27
1K0
带容量约束的弧路径问题(CARP)简介
路径问题的研究可以分为两个方向:以点为服务对象的车辆路径问题(VRP)和以弧为服务对象的弧路径问题(ARP)。不同于前者,ARP的基本特征是车队从一个仓库出发,对所有需要服务的边进行作业,而不是在顶点进行服务。弧路径问题大致可以分为三类:中国邮路问题、乡村邮路问题和带容量约束的弧路径问题。
用户1621951
2020/02/26
2.3K0
带容量约束的弧路径问题(CARP)简介
不完全免疫算法简介AIMA--AIS学习笔记7
Most multi-objective immune algorithms (MOIAs) adopt clonal selection to speed up convergence, as this operator only clones the best individuals during the search process. However, this approach somehow deteriorates the population diversity, which may cause a MOIA to be trapped in a local optimum and could also lead to premature convergence when tackling some complicated multiobjective optimization problems (MOPs). In order to overcome this problem, an adaptive immune-inspired multi-objective algorithm (AIMA) is presented in this paper, in which multiple differential evolution (DE) strategies having distinct advantages are embedded into a conventional MOIA. Our proposed approach strengthens the exploration capabilities of a MOIA while also improving its population diversity. At each generation, based on the current search stage, an adaptive selection method is designed to choose an appropriate DE strategy for evolution. The core idea is to effectively combine the advantages of three DE strategies when solving different MOPs. A number of comparative experiments are conducted on the well-known and frequently- used WFG and DTLZ test problems. Our experimental results validate the superiority of our proposed AIMA, as it performs better than some state-of-the-art multi-objective opti- mization algorithms and some state-of-the-art MOIAs.
演化计算与人工智能
2020/08/14
5530
COMP7404 Topic 1 Solving Problems by Searching
COMP7404 Computational Intelligence and Machine Learning Topic 1 Solving Problems by Searching Types of Search Uninformed Search (Know nothing about the problem except definition) Informed Search (know something more like how close to the goal) Local Searc
pseudoyu
2023/04/11
3770
COMP7404 Topic 1 Solving Problems by Searching
SRS: Load Balancing Streaming Servers
Load Balancing Streaming Servers Written by Winlin[1], Azusachino[2], Benjamin 程序员确实应该要能看英文和写英文,支持陶老板说的《要做研发高手,就是必须能看英文、写英文》,所以写了一篇试试。大家先试试看看吧,看看能不能看懂,搞不好大家都能看懂;万一看不懂,我们会再翻译成中文;请评论区留言哦。 When our business workloads exceed streaming-server capacity, we have
Winlin
2022/05/17
7750
SRS: Load Balancing Streaming Servers
Managing Supply and Demand Balance Through Machine Learning-笔记
Managing Supply and Demand Balance Through Machine Learning
Tyan
2022/08/11
3950
Managing Supply and Demand Balance Through Machine Learning-笔记
Learning to Solve Security-Constrained Unit Commitment Problems
论文阅读笔记,个人理解,如有错误请指正,感激不尽!该文分类到Machine learning alongside optimization algorithms。
短短的路走走停停
2021/08/12
1.5K0
基于学习的方法决定在哪些分支节点上运行heuristic算法
论文阅读笔记,个人理解,如有错误请指正,感激不尽!该文分类到Machine learning alongside optimization algorithms。
短短的路走走停停
2021/07/20
2.4K0
基于学习的方法决定在哪些分支节点上运行heuristic算法
不完全免疫算法简介MOIA-DCSS--AIS学习笔记8
However, when solving some complicated MOPs (e.g., the UF test problems [30] and F test problems [32]), this kind of clonal selection operator based on the crowding distance metric is not so effective due to the complicated PSs and PFs. Therefore, this paper presents a novel MOIA with a clonal selection strategy based on decomposition approach (MOIA-DCSS), which is expected to have a stronger exploration capability on tackling these complicated MOPs.
演化计算与人工智能
2020/08/14
5400
机器人相关学术速递[9.9]
【1】 Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking 标题:全景NuScenes:一种用于LiDAR全景分割和跟踪的大规模基准 链接:https://arxiv.org/abs/2109.03805
公众号-arXiv每日学术速递
2021/09/16
4960
Control is important! model predictive control mpc.pytorch lib
Optimal control is a widespread field that involve finding an optimal sequence of future actions to take in a system or environment. This is the most useful in domains when you can analytically model your system and can easily define a cost to optimize over your system. This project focuses on solving model predictive control (MPC) with the box-DDP heuristic. MPC is a powerhouse in many real-world domains ranging from short-time horizon robot control tasks to long-time horizon control of chemical processing plants. More recently, the reinforcement learning community, strife with poor sample-complexity and instability issues in model-free learning, has been activelysearching for useful model-based applications and priors.
CreateAMind
2019/01/03
1.4K0
推荐阅读
相关推荐
Machine-Learning–Based Column Selection for Column Generation
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档