k8s的架构如图:

我们都知道k8s分为master、node,其中:
master 主要有如下几个组件:
node 主要包含以下组件:

这个过程看起来似乎比较简单,但实际生产环境的调度过程中,有很多问题需要考虑:
调度过程分为2个阶段:

代码位置(1.10 ):
https://github.com/kubernetes/kubernetes/tree/release-1.10/pkg/scheduler/algorithm优选(Priorities)
经过预选策略(Predicates)对节点过滤,获取节点列表,再对符合需求节点列表进行打分,最终选择Pod调度到一个分值最高节点。
最终主机的得分用以下公式计算得出:
finalScoreNode = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + … + (weightn * priorityFuncn)
查看一个node的资源信息:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
status:
addresses:
- address: 10.162.197.135
type: InternalIP
allocatable:
cpu: "8"
memory: 16309412Ki
pods: "110"
capacity:
cpu: "8"
memory: 16411812Ki
pods: "110"
conditions: {...}
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images: {...}
nodeInfo: {...}
查看一个pod的资源信息:
kubectl explain pod.spec
我们看这个pod:
注释:

nodeSelector【将来会被废弃】:将 Pod 调度到特定的 Node 上:
apiVersion: v1
kind: Pod
metadata:
labels:
pod-template-hash: "4173307778"
run: my-pod
name: my-pod
namespace: default
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: my-pod
ports:
- containerPort: 80
protocol: TCP
resources: {}
nodeSelector:
disktype: ssd
node-flavor: s3.large.21. podAffinity:让某些 Pod 分布在同一组 Node 上:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0与nodeAffinity的关键差异:
硬性过滤:排除不具备指定pod的node组
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone软性:不具备指定pod的node组打低分,降低该组node被选中的几率
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname2. podAntiAffinity:避免某些 Pod 分布在同一组 Node 上:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: kubernetes.io/zone
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0与podAffinity的差异:
3. Taints:避免 Pod 调度到特定 Node 上:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
taints:
- effect: NoSchedule
key: accelerator
timeAdded: null
value: gpukubectl taint node node-n1 foo=bar:NoSchedule
kubectl taint node node-n1 foo:NoSchedule-4. Tolerations:允许 Pod 调度到有特定 taints 的 Node 上:
apiVersion: v1
kind: Pod
metadata:
labels:
run: my-pod
name: my-pod
namespace: default
spec:
containers:
- name: my-pod
image: nginx
tolerations:
- key: accelerator
operator: Equal
value: gpu
effect: NoSchedule可以无视排斥:
apiVersion: v1
kind: Node
metadata:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: node-n1
name: node-n1
spec:
externalID: node-n1
taints:
- effect: NoSchedule
key: accelerator
timeAdded: null
value: gpu1. nodeName:将Pod手动调度到特定的 Node 上:

2. DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
selector:
matchLabels:
name: my-daemonset
template:
metadata:
labels:
name: my-daemonset
spec:
containers:
- name: container
image: k8s.gcr.io/pause:2.0等同于:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deploy
spec:
replicas: <# of nodes>
selector:
matchLabels:
podlabel: daemonset
teplate:
metadata:
labels:
podlabel: daemonset
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: podlabel
operator: In
values:
- daemonset
topologyKey: kubernetes.io/hostname
containers:
- name: container
image: k8s.gcr.io/pause:2.0查看调度结果:
kubectl get po podname –o wide查看调度失败原因:
kubectl describe po podname调度错误列表:



例子:
https://kubernetes.io/blog/2017/03/advanced-scheduling-in-kubernetes/前面讲的调度是指资源节点的调度,优先级也是指节点的优先级。高优先级的Pod会优先被调度,或者在资源不足低情况牺牲低优先级的Pod,以便于重要的Pod能够得到资源部署。
为了定义Pod优先级,需要先定义PriorityClass对象,该对象没有Namespace限制,官网示例:

然后通过在Pod的spec. priorityClassName中指定已定义的PriorityClass名称即可:

欢迎大家关注个站哟:damon8.cn。
最后介绍新公号:天山六路折梅手,欢迎关注。
微服务自动化部署CI/CD
如何利用k8s拉取私有仓库镜像
个站建设基础教程
ArrayList、LinkedList 你真的了解吗?
大佬整理的mysql规范,分享给大家
如果张东升是个程序员
微服务架构设计之解耦合
浅谈负载均衡
Oauth2的认证实战-HA篇
Oauth2的授权码模式《上》
浅谈开发与研发之差异
浅谈 Java 集合 | 底层源码解析
基于 Sentinel 作熔断 | 文末赠资料
基础设施服务k8s快速部署之HA篇
今天被问微服务,这几点,让面试官刮目相看
Spring cloud 之多种方式限流(实战)
Spring cloud 之熔断机制(实战)
面试被问finally 和 return,到底谁先执行?
Springcloud Oauth2 HA篇
Spring Cloud Kubernetes之实战一配置管理
Spring Cloud Kubernetes之实战二服务注册与发现
Spring Cloud Kubernetes之实战三网关Gateway