kubernetes
Table of ContentsClose
1. kubernetes
1.2. GUI/CLI de kubernetes
1.2.1. derailed/k9s: 🐶 Kubernetes CLI To Manage Your Clusters In Style!
https://k9scli.io/topics/commands/
https://github.com/derailed/k9s/issues/1678
https://github.com/derailed/k9s/tree/master/plugins
https://k9scli.io/topics/plugins/
1.2.1.1. Mejoras k9s
- kubectl que en función de si estás en una carpeta y sus subcarpetas, o un repo git, entonces te despliegue en un sitio predefinido, independiente del contexto global
kubectl ejecutar como root desde k9s: https://github.com/jordanwilson230/kubectl-plugins/blob/krew/kubectl-exec-as
kubectl exec-as -d -u 0 nombre-pod-647f5cc8ff-zx5mk -- /bin/bash
Está desactualizado porque utiliza docker en vez de crictl, y parece que en crictl no es posible pasarle el usuario para ejecutar
https://github.com/cri-o/cri-o/issues/3770
https://github.com/kubernetes-sigs/cri-tools/issues/507
Esto puede solucionarlo? https://jbn1233.medium.com/theres-no-cricrl-exec-u-0-and-this-is-a-solution-f33198365054
- kubectl lanzar con un endpoint distinto un cronjob o un deployment. Por ejemplo entrypoint un sleep inf y luego lanzas el despliegue por consola para ver qué pasa
- kubectl sacar logs antiguos de un pod
https://stackoverflow.com/questions/53411958/how-to-view-logs-of-failed-jobs-with-kubectl - mirando los logs con timestamp, marcar dos fechas con kubectl y realizar benchmarking de cuánto ha tardado, quizás marcando dos puntos, o con un grep…
1.2.1.2. Archivos de config
~/.config/k9s/config.yaml ~/.config/k9s/config.yml ~/.local/share/k9s/clusters/<cluster>/<cluster>/config.yaml ~/.kube/config
1.2.2. kubernetes-sigs/kui: A hybrid command-line/UI development experience for cloud-native development
1.2.3. Other Tools | Kubernetes
1.2.4. Alternativas
1.3. links
- https://learnk8s.io/
- calico, istio
- kubectx cambiar entre distintos kubectl, como venvs
- Escenarios interactivos para aprender kubernetes
1.4. Limits and Requests
https://home.robusta.dev/blog/kubernetes-memory-limit
CPU is fundamentally different than memory. CPU is a compressible resource and memory is not. In simpler terms, you can give someone spare CPU in one moment when it’s free, but that does not obligate you to continue giving them CPU in the next moment when another pod needs it. There is no downside to giving away idle CPU, because it’s easy and non-violent to reclaim it.
To paraphrase Tim Hockin, one of the Kubernetes maintainers at Google, the best practice for Kubernetes resource limits is to set memory limit=request, and never set CPU limits to avoid Kubernetes CPU throttling.
1.5. Tricks
1.5.1. Dynamic values in secret.yaml
https://stackoverflow.com/questions/48296082/how-to-set-dynamic-values-with-kubernetes-yaml-file
You can also use envsubst
when deploying.
e.g.
cat app/deployment.yaml | envsubst | kubectl apply ...
Also possible without cat: envsubst < deployment.yaml | kubectl apply -f -
cat secret.yaml | sed -E 's/__$/"/g' | sed -E 's/__/"$/g' | envsubst | kubectl apply -f -
1.5.2. kubectl get secrets
kubectl get secrets --namespace namespace-name -o json | jaq '.items[].data | map_values(@base64d)' # Para ver a qué secrets pertenece cada uno kubectl get secrets -o json | jaq '{name: .items[].metadata.name, data: .items[].data | map_values(@base64d)}' # TODO No repetir el name por cada data, sino que un mismo name agrupe muchos data
1.5.3. kubectl proxy && kubectl port-forward
kubectl proxy --port=4000 # Puedes usar kubernetes DNS <service-name>.<namespace>.svc.cluster.local o también 127.0.0.1:4000/api/v1/namespaces/my-namespace/services/my-service-name/proxy kubectl port-forward --namespace my-namespace $(kubectl get pod --namespace my-namespace --selector="app=my-app-name" --output jsonpath='{.items[0].metadata.name}') 8000:8000
1.5.4. kubectl debug
’debug’ provides automation for common debugging tasks for cluster objects identified by resource and name. Pods will be used by default if no resource is specified.
https://learn.microsoft.com/en-us/azure/aks/node-access
1.5.5. kubectl top
Kubernetes memory of a cluster
kubectl top pod -A --sum=true --sort-by=memory kubectl top pod -A --sum=true | awk 'BEGIN { count = 0 } { count += $4 } END { print; print count, "Mi" }' kubectl top node --sort-by=memory | awk 'BEGIN {count = 0; perc = 0 } {count += $4; perc += $5; print $0} END {print count, perc}'
1.5.6. kubectl max memory per deploy and cronjob
kubectl get deploy -A -o json | jq -r '.items.[] | "\(.spec.template.spec.containers.[].resources.limits.memory // "16G" ) \(.metadata.name)"' | sort -hr kubectl get cronjob -A -o json | jq -r '.items.[] | "\(.spec.jobTemplate.spec.template.spec.containers.[].resources.limits.memory)\t\(.spec.schedule)\t\(.metadata.name)"'
1.5.7. kubectl get sha256
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.status.containerStatuses[*].imageID}{"\n"}{end}'
1.5.8. kubectl cp
kubectl cp podname-nls-8rr43:/data data/
1.5.9. kubelet edit config
https://kubernetes-docsy-staging.netlify.app/docs/tasks/administer-cluster/reconfigure-kubelet/
NODE=node-2cpu8gb kubectl get --raw "/api/v1/nodes/$NODE/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' > config.json kubectl get --raw "/api/v1/nodes/$NODE/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' | yq -y > config.yaml
1.5.10. kubectl bare metal access (root, privileged)
https://github.com/kvaps/kubectl-node-shell
- See logs of previous pods: https://stackoverflow.com/questions/34084689/view-log-files-of-crashed-pods-in-kubernetes
/var/log/containers/
Exec as root
No me ha funcionado pero debería de ser esto:
crictl ps crictl inspect --output go-template --template '{{.info.sandboxID}}' <containerID> runc exec -t -u 0 <sandboxID> /bin/bash
1.5.11. overwrite app.yaml with latest sha256
# tee to /dev/tty so that you can see docker push terminal output as it happens checksum=$(docker push cr.io/repo/image:latest | tee /dev/tty | tail -n 1 | sed -E 's/.+digest: (.+) size: [0-9]+/\1/') sed -Ei "s% - image: cr.io/repo/image.*% - image: cr.io/repo/image@$checksum%" app.yaml
1.6. kubectl plugins (krew)
1.7. kubectl apply (declarative) vs replace (imperative)
- What is the difference between kubectl apply and kubectl replace - Stack Overflow
- Configmap
- immediately update the files mounted by all Pods consuming them
- not update the environment variables or command line arguments until the Pod is restarted (because you cannot replace env variables)
- immediately update the files mounted by all Pods consuming them
- Pods
- kubectl apply -f deployment.yaml does not work if you are using :latest, you have to use replace
- kubectl apply -f deployment.yaml does work if you are using @sha256:b1ab1ab1a…
- kubectl apply -f deployment.yaml does not work if you are using :latest, you have to use replace
- Field Merge Semantics | SIG CLI
1.8. Kubernetes DNS
- Kubernetes: DNS. In Kubernetes, DNS names are assigned… | by Claire Lee | Medium
In Kubernetes, DNS names are assigned to Pods and Services for communication by name instead of IP address. The default domain name used for DNS resolution within the cluster is cluster.local, which can be customized if required. The DNS name for a Service follows the format <service-name>.<namespace>.svc.cluster.local, while the DNS name for a Pod follows the format <pod-ip-address-replace-dot-with-hyphen>.<namespace>.pod.cluster.local. CoreDNS operates based on a configuration file called “Corefile” that specifies how the DNS server should operate and respond to incoming requests. - https://kubernetes.io/docs/concepts/services-networking/service/
1.9. Se puede montar un volumen de kubernetes en local?
Tienes kubectl cp pero sólo puedes copiar, no subir
1.10. kubelet counts active page cache against memory.available
- .NET, rsync, and the Linux Page Cache: A Kubernetes War Story | Unconstant Conjunction
- kubelet counts active page cache against memory.available (maybe it shouldn’t?) · Issue #43916 · kubernetes/kubernetes
- https://unix.stackexchange.com/questions/36907/drop-a-specific-file-from-the-linux-filesystem-cache
- https://serverfault.com/questions/278454/is-it-possible-to-list-the-files-that-are-cached
- Cgroups - Deep Dive into Resource Management in Kubernetes | Martin Heinz | Personal Website & Blog
- https://unix.stackexchange.com/questions/572328/how-to-turn-off-block-cache-for-individual-processes
- https://github.com/Feh/nocache
- slabtop -s c
lsof -Fnst | awk ' { field = substr($0,1,1); sub(/^./,""); } field == "p" { pid = $0; } field == "t" { if ($0 == "REG") size = 0; else next; } field == "s" { size = $0; } field == "n" && size != 0 { print size, $0; } ' | sort -k1n -u | tail -n42 | sed 's/^[0-9]* //' | xargs -I {} dd of='{}' oflag=nocache conv=notrunc,fdatasync count=0
1.11. lo bueno es esto
User-defined movidas de kubernetes:
1.12. helm vs kustomize
1.13. sidecar
Un sidecar es un pod que tiene un servicio (conexión bbdd, logs, instrumentación, volúmenes)
1.13.1. cloud-sql-proxy como sidecar
Si tienes un cronjob, el cloud-sql-proxy se queda escuchando
Está bien para deployments, pero para cronjobs tienes el proxy siempre levantado
Se habilita un endpoint en localhost:9001/quitquitquit para poder pararlo desde el servicio principal
https://lorenzofelletti.medium.com/great-article-d0eb61bf389d
https://stackoverflow.com/questions/62503682/how-to-shut-down-cloud-sql-proxy-in-a-helm-chart-pre-install-hook
https://medium.com/finnovate-io/how-to-prevent-kubernetes-cron-jobs-with-sidecar-containers-from-getting-stuck-912c0f1497a3
containers: - command: - /cloud-sql-proxy - database-instance-name:europe-west1:gcloud-project - -p - "5432" - --credentials-file=/secrets/cloudsql/credentials.json - --quitquitquit image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.2 imagePullPolicy: IfNotPresent name: cloudsql-proxy-gcloud-project resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /secrets/cloudsql name: cloudsql-instance-credentials readOnly: true - name: cronjob-daily image: eu.gcr.io/gcloud-project/docker-image:latest args: - /bin/bash - -c - python cron_hourly.py; curl -X POST localhost:9091/quitquitquit
1.14. Configuración de projecto y cluster con gcloud + kubectl
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl
gcloud container clusters get-credentials myproject-dev --project myproject-dev --zone europe-west1-b gcloud container clusters get-credentials cluster-test --project myproject-test --zone europe-west1 # Esto seleciona el proyecto para kubectl kubectl config set-context --current --namespace=myproject # Esto selecciona el namespace gcloud config set project myproject # Esto selecciona el proyecto de gcloud kubectl proxy --port=4000 # Te le vanta un proxy directamente al cluster
1.15. Configuración de projecto y cluster con azure + kubectl
docker login -u my-project-token my-project.azurecr.io
az login; az account set --subscription e5427996-29ad-4773-477d-aa58c8bf10bc; az aks get-credentials --name my-project --resource-group RES-IN-AZ-ENV
kubectl config set-context --current --cluster my-project --namespace='my-namespace'
k9s -n my-namespace --context my-project-aks
1.16. Config con CerBot
cmctl status certificate service-tls --namespace service
1.17. Debugging
kubectl exec -it pod-name-dff949685-fr46b -- /bin/bash
1.18. https://blog.cetinich.net/bookmarks/
- https://kubedex.com/90-days-of-aws-eks-in-production/
- https://jvns.ca/blog/2017/06/04/learning-about-kubernetes/
- https://jvns.ca/blog/2017/07/27/how-does-the-kubernetes-scheduler-work/
- https://jvns.ca/blog/2017/08/05/how-kubernetes-certificates-work/
- https://jvns.ca/blog/2017/10/05/reasons-kubernetes-is-cool/
- https://jvns.ca/blog/2017/10/10/operating-a-kubernetes-network/
- https://jvns.ca/blog/2020/04/29/why-strace-doesnt-work-in-docker/
- https://www.eksworkshop.com/010_introduction/basics/concepts_objects/
- https://redhatspain.com/kubernetes/
1.19. Conceptos
1.20. Referencia de la API
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28
https://github.com/kubernetes/api/blob/master/core/v1/types.go
1.20.1. Notación (sintaxis parecida a Python)
<>
indica tipoobject
genérico (puede saberse pero no se concreta
a propósito de este resumen)def
indica valores por defecto- Los booleanos se escriben
option:False
para decir que es el valor
por defecto nombre: tipo. Comentario [:{subcampos: tipo. Comentario}]
- Un tipo antes de unas llaves afecta a todas las variables de dentro
salvo que se sobreescriban.a:Resource:str{b, c, d:int}
significa:
a es Resource, b y c son str, d es int
- Si el tipo está incluido en el nombre, se marca con **:
lastProbe*Time*
es tipoTime
, y*ContainerStatus*es
es tipo
ContainerStatus list
- Describir tipos que tienen los mismos campos:
[limits:<>.Maximum amount, requests:<>.Minimum amount] :<>{cpu, memory, hugepages}
- Enumeraciones con ():
state: str("start", "run", "stop", "fail")
1.20.2. Container
1.20.2.1. Container v1 core
- args: str list - command: str list - env: EnvVar list:str{name, value} - image: str - imagePullPolicy: str ("Always" def if image~=":latest", "Never", def "IfNotPresent") - lifecycle: Lifecycle:{postStart: Handler, preStop: Handler} - livenessProbe: Probe:<>{exec, failureThreshold, httpGet, initialDelaySeconds, periodSeconds}, sucessThreshold, tcpSocket, timeoutSeconds} - name: str - ports: ContainerPort list:<>{containerPort, hostIP, hostPort, name, protocol} - readinessProbe: Probe - resources: ResourceRequirements{[limits:<>.Maximum amount, requests:<>.Minimum amount] :<>{cpu, memory, hugepages}} - *securityContext*: :<>{allowPrivilegeEscalation, capabilities, privileged, procMount, readOnlyFilesystem, runAsGroup, runAsNonRoot, runAsUser, seLinuxOptions, windowsOptions} - startupProbe: Probe - stdin: False. Allocate a buffer for stdin? - stdinOnce: False. Close stdin once it has been opened? - terminationMessagePath: str - terminationMessagePolicy: str - tty: False. Allocate tty? Requires stdin: True - *volumeDevice*s: : str{devicePath, name} - volumeMount: VolumeMount list: str{mountPath, mountPropagation, name, readOnly:False, subPath, subPathExpr} - workingDir: str
1.20.2.2. ContainerStates v1 core
- containerID: "docker://<container_id>" - image: str - imageID: str - lastState: ContainerState:<>{running:{startedAt}, terminated:{containerID,exitCode,finishedAt, message,reason,signal,startedAt},waiting:{message,reason}, name:str, ready:bool, restartCount: int, started:bool, state:ContainerState}
1.20.3. Pod v1 core
- [apiVersion, kind]: str - metadata: ObjectMetadata - spec: PodSpec - status: PodStatus
1.20.3.1. PodSpec v1 core
- activeDeadlineSeconds: int. Seconds active before pod is marked as failed - affinity: Affinity - automountServiceAcountToken: bool - *container*s - dnsConfig: <> - dnsPolicy: str - enableServiceLinks: true. Docker links(?) - *ephemeralContainer*s - *hostAlias*es: {hostnames: str list, ip: str} - hostIPC: False - hostNetwork: False - hostPID: False - hostname: str - imagePullSecrets: LocalObjectReference <> - init*Container*s - nodeName: str - nodeSelector: <> - overhead: <>. Autopopulated - readinessGates: PodReadinessGate list - restartPolicy: str(def "Always", "OnFailure", "Never") - runtimeClassName: str - schedulerName: str - securityContext: PodSecurityContext - serviceAccountName: str - shareProcessNamespace: bool - subdomain: str - terminationGracePeriodSeconds: int. Seconds the pod needs to terminate gracefully, may be decreased in delete rqeuests. def 30 - *toleration*s - *topologySpreadConstraint*s - *volume*s: <>{awsElasticBlockStore, azureDisk, azureFile, cephfs, cinder, configMap, csi, downwardAPI, emptyDir, fc, flexVolume, flocker, gcePersistentDisk, gitRepo, glusterfs, hostPath, iscsi, name, nfs, persistentVolumeClaim, photonPersistentDisk, portworxVolume, projected, quobyte, rbd, scaleIO, secret, storageos, vsphereVolume}
1.20.3.2. PodStatus v1 core
- conditions: PodCondition list - *containerStatus*es - ephemeral*ContainerStatus*es - hostIP: str - init*ContainerStatus*es - message: str - nominatedNodeName: str - phase: str - podIP: str - podIPs: PodIP list - qosClass: str - reason: str - start*Time*
1.20.4. Deployment v1 apps
- [apiVersion, kind]: str - metadata: ObjectMetadata - spec: DeploymentSpec - status: DeploymentStatus
1.20.4.1. DeploymentSpec v1 app
- minReadySeconds: int. How many seconds should a pod be running (no containers crashing) for it to be considered available? def 0 - paused: bool - progressDeadlineSeconds: The maximum time in seconds for a deployment to make progress before it is considered to be failed. def 600 - replicas: int. Number of desired pods. def 1 - revisionHistoryLimit: int. Number of old ReplicaSets to retain to allow rollback. def 10 - selector: LabelSelector{matchExpressions: LabelSelectorRequirement list, matchLabels:<>} - strategy: DeploymentStrategy: {rollingUpdate: RollingUpdateDeployment: { maxSurge:<>. Maximum number of pods that can be scheduled above the desired number of pods, can be absolute number or a percentage of desired pods (number is rounded up). def 25% maxUnavailable:<>. Maximum number of pods that hat can be unavailable during the update, can be absolute number or a percentage of desired pods (number is rounded down). }, type: str("Recreate", def "RollingUpdate") } - template: PodTemplateSpec: {metadata: ObjectMetadata, spec: PodSpec}
1.20.4.2. DeploymentStatus v1 apps
- availableReplicas: int. Total number of available pods (ready for at least minReadySeconds) - collisionCount: int. Count of hash collisions for the Deployment - conditions: DeploymentCondition list: { lastTransition*Time*. Last time the condition transitioned from one status to another, lastUpdate*Time* Last time this condition was updated, [message,reason,status,type] :str} - observedGeneration: int - readyReplicas: int - replicas: int - unavailableReplicas: int - updatedReplicas: int
1.20.5. Job v1 batch
- [apiVersion, kind]: str - metadata: ObjectMetadata - spec: JobSpec - status: JobStatus
1.20.5.1. JobSpec v1 batch
- activeDeadlineSeconds: int. Seconds relative to the startTime that the job may be active before the system tries to terminate it - backoffLimit: int. Number of retries before marking this job failed. def 6 - completions: int. Desired number of successfully finished pods the job should be run with - parallelism: int. Maximum desired number of pods the job should run at any given time - selector: LabelSelector. A label query over pods (should match the pod count), usually autocompleted - template: PodTemplateSpec - ttlSecondsAfterFinished: int
1.20.5.2. JobStatus v1 batch
- active: int. Number of actively running pods - completion*Time* - conditions JobCondition list. Latest available observations of an object current state: { lastProbe*Time*, lastTransition*Time*, [message,reason,status,type]: str } - failed: int. Number of pods which reached phase Failed. - start*Time* - suceeded: int. Number of pods which reached phase Suceeded.
1.20.6. StatefulSet v1 apps
- [apiVersion, kind]: str - metadata: ObjectMetadata - spec: StatefulSetSpec - status: StatefulSetStatus
1.20.6.1. StatefulSetSpec v1 apps
- podManagementPolicy: str(def "OrderedReady". pods are created in increasing order, controller waits until each pod is ready. When scaling down, the pods are removed in the opposite order, "Parallel") - replicas: int. Desired number of replicas. def 1 - revisionHistoryLimit: int. Maximum number of revisions maintained in revision history. def 10 - selector: LabelSelector - serviceName: str. Name of the service that governs this StatefulSet. This service must exist before the StatefulSet, and is responsible for the network identity of the set - template: PodTemplateSpec - updateStrategy: StatefulSetUpdateStrategy {rollingUpdate: RollingUpdateStatefulSetStrategy{ partition:int. Indicates the ordinal at which the StatefulSet should be partitioned. def 0}, type: str(def "RollingUpdate")} - volumeClaimTemplates: PersistentVolumeClaim list: {[apiVersion, kind]: str, metadata:ObjectMetadata, spec: PersistentVolumeClaimSpec, status: PersistentVolumeClaimStatus}
1.20.6.2. StatefulSetStatus v1 apps
1.20.7. CronJob v1beta1 batch
- apiVersion: str - kind: str - metadata: ObjectMetaData - spec: CronJobSpec - status: CronJobStatus
1.20.7.1. CronJobSpec v1beta1 batch
- concurrencyPolicy: str(def "Allow": allows CronJobs to run concurrently, "Forbid": forbids concurrent runs skipping next run if previous run hasn't finished yet, "Replace": cancels currently running job and replaces it with a new one) - failedJobsHistoryLimit: int. Number of failed finished jobs to retain. def 1 - jobTemplate: JobTemplateSpec {metadata: ObjectMeta, spec: JobSpec} - schedule:str. The schedule in Cron format - startingDeadlineSeconds: int. Optional deadline in seconds for starting the job if it misses scheduled time for any reason. Missed jobs executions will be counted as failed ones. - successfulJobsHistoryLimit:int. Number of successful finished jobs to retain. def 3 - suspend: false. This flag tells the controller to suspend subsequent executions, it does not apply to already started executions.
1.20.8. LabelSelector v1 meta
- matchExpressions: LabelSelectorRequirement list. The requirements are ANDed { key: str operator: str("In", "NotIn", "Exists", "DoesNotExist") } - matchLabels <>. matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. - values str list. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty.
1.20.9. Service v1 core
- [apiVersion, kind]: str - spec: ServiceSpec - status: ServiceStatus
1.20.9.1. ServiceSpec v1 core
- allocateLoadBalancerNodePorts: bool def true - clusterIP: str. Optional - clusterIPs: list[str]. Optional - externalIPs: list[str] - externalName: str - externalTrafficPolicy: str(def "Cluter", "Local") - healthCheckNodePort: int - selector: Route service traffic to pods with label keys and values matching this selector.
1.20.10. Secrets en variables de entorno
Crear el Secret
apiVersion: v1 kind: Secret metadata: name: mysecret type: Opaque data: USER_NAME: YWRtaW4= PASSWORD: MWYyZDFlMmU2N2Rm
Referenciar el Secret
apiVersion: v1 kind: Pod metadata: name: secret-test-pod spec: containers: - name: test-container image: k8s.gcr.io/busybox command: [ "/bin/sh", "-c", "env" ] envFrom: - secretRef: name: mysecret restartPolicy: Never
Las variables de entorno tienen ya la información necesaria
Puedes separar varios secrets con una nueva línea y ---
1.20.11. Regularidades
- Todos los objetos tienen
spec
, que es el estado deseado que tenga el
objeto, ystatus
que es el estado actual. - Los campos TemplateSpec son siempre
TemplateSpecABC: {metadata: ObjectMetadata, spec: ABCSpec}
1.20.11.1. Conditions
JobCondition, DeploymentCondition
1.21. Node Scheduling: Taints vs Tolerations
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite – they allow a node to repel a set of pods.
Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling but don’t guarantee scheduling: the scheduler also evaluates other parameters as part of its function.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
1.22. “3 Years of Kubernetes in Production–Here’s What We Learned”
“3 Years of Kubernetes in Production–Here’s What We Learned” por Komal Venkatesh
Ganesan https://link.medium.com/hv7uaNtsS9