Kubernetes: Difference between revisions
Line 320: | Line 320: | ||
ks get configmap cluster-autoscaler-status -o yaml | ks get configmap cluster-autoscaler-status -o yaml | ||
if you | === Steps to move hardware around === | ||
in this case we are removing the last node from an instance group and then removing the instance group. | |||
Reference: https://kubernetes.io/docs/concepts/architecture/nodes/ | |||
1. Cordon the node | |||
k cordon ip-xx-xx-xx-xx.region.compute.internal | |||
No new pods will be deployed here. | |||
2. drain ( move pods here to somewhere else ) | |||
k drain ip-xx-xx-xx-xx.region.compute.internal | |||
You may need to add "--ignore-daemonsets" if you have daemonsets running ( data dog , localredis ) | |||
You may ned to "--delete-local-data" if you have a metrics server on this node. BE CAREFUL. You will loose metrics, but probably you have an "out of cluster" place where metrics are stored ( datadog, elastic search, etc ) | |||
3. remove the nodegroup from the autoscaler: | |||
ks edit deploy cluster-autoscaler | ks edit deploy cluster-autoscaler | ||
4. tell kops to delete the instance group. | |||
kops delete ig myig | |||
== Also See == | == Also See == |
Revision as of 19:55, 12 November 2018
Useful
alias:
alias k="kubectl" alias ks="kubectl --namespace kube-system" # Kubernetes Events alias ke="kubectl get events --sort-by='{.lastTimestamp}'" # Kubernetes System stuff alias kse="kubectl --namespace kube-system get events --sort-by='{.lastTimestamp}'" # Kubernetes Systems Events
dump all :
kubectl get all --export=true -o yaml
( namespace kube-system not dumped )
list form:
k get pods k get rs # replica set k get rc # replication controller
what are all the things ?
kubectl api-resources
event sorted by time
kubectl get events --sort-by=.metadata.creationTimestamp
what storage classes does my cluster support?
k get storageclass
how are pod spread out over nodes:
k describe node | grep -E '(^Non-t|m |^Name)' | more
( doesn't scale well, no indication of )
if you used kops to deploy the cluster then nodes are labes with their instance groups, you can be more specific like this:
k describe node -l kops.k8s.io/instancegroup=<instance group name> | grep -E '(^Non-t|m |^Name)' | more
how many nods in each instance group? ( tested under kops )
for i in `kops get ig 2>/dev/null| grep -v NAME | awk '{print $1}'` do echo $i kubectl get nodes -l kops.k8s.io/instancegroup=$i done
how many pods per node:
k get pod -o wide | grep -v NAME | awk '{print $8}' | sort | uniq -c | sort -rn
k get pod --all-namespaces -o wide | grep -v NAME | awk '{print $8}' | sort | uniq -c | sort -rn
audit: who tried to do what?
ks get pod | grep kube-apiserver-ip
ks logs $podname
who tried to scale unsuccessfully?
ks logs $podname | grep scale | grep cloud | awk '$8!=200{print $0}'
Where is the service account token that I gave this pod?
It's in here: /var/run/secrets/kubernetes.io/serviceaccount/token
Scripting Scaling
Manually edit the replicas of a deployment from within the same namespace, but in a different pod:
- give the actor pod a service account ( possibly via it's deployment ).
- create a Role as below.
- create the RoleBinding to connect the ServiceAccount to the Role.
Now you have: Pod -> Deployment -> ServiceAccount -> RoleBinding -> Role
Now the Pod has permission to do what it needs. Very similar to AWS's "IAM Role" where you give an instance a role that has the permissions that it needs to operate.
Note that in this case "ClusterRole" and ClusterRoleBinding are not required. It's all namespaced to the namespace that your deployment is in. In this case: "default".
export API_URL="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/${KUBE_ENDPOINT}" export TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token` export CURL_CA_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt curl \ -H 'Accept: application/json' \ -H "Authorization: Bearer $TOKEN" \ $API_URL \ > scale.json # edit scale.json, set replicas to 4 curl -X PUT \ -d@scale.json \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $TOKEN" \ $API_URL
CURL_CA_BUNDLE - kubenerets is it's own CA, and presennts to each pod a ca bundle that makes ssl "in" the cluster valid.
This was the role that did it. FIXME: pare it down
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kube-cloudwatch-autoscaler labels: app: kube-cloudwatch-autoscaler rules: - apiGroups: - "" resources: - nodes verbs: - list - apiGroups: - apps resources: - deployments - deployments.apps - deployments.apps/scale - "*/scale" verbs: - get - update - patch - put - apiGroups: - "" resources: - configmaps verbs: - get - create
On patching
There are a couple of way to change an object.
export TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token` export CURL_CA_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
1. dump whole "thing" , make change post object back ( as above ) GET -> PUT
curl \ -v \ -H 'Accept: application/json' \ -H "Authorization: Bearer $TOKEN" \ $API_URL \ > scale.json # edit scale.json, set replicas to 4 curl -X PUT \ -d@scale.json \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $TOKEN" \ $API_URL
2. terse PATCH
curl -sS \ -X 'PATCH' \ -H "Authorization: Bearer ${TOKEN}" \ -H 'Content-Type: application/merge-patch+json' \ $API_URL \ -d '{"spec": {"replicas": 1}}'
3. old / full PATCH ?
reference: https://stackoverflow.com/questions/41792851/manage-replicas-count-for-deployment-using-kubernetes-api ( 1 year 8 months old at tie of _this_ writing )
Careful, compare:
BORKEN!
PAYLOAD='[{"op":"replace","path":"/spec/replicas","value":"3"}]' curl \ -X PATCH \ -d ${PAYLOAD} \ -H 'Content-Type: application/json-patch+json' \ -H "Authorization: Bearer ${TOKEN}" \ $API_URL
WERKS!
curl \ -X PATCH \ -d '[{"op":"replace","path":"/spec/replicas","value":3}]' \ -H 'Content-Type: application/json-patch+json' \ -H "Authorization: Bearer ${TOKEN}" \ $API_URL
Closely:
-d '[{"op":"replace","path":"/spec/replicas","value":"3"}]' <- broken -d '[{"op":"replace","path":"/spec/replicas","value":3}]' <- works
Template examples
list images by pod:
kubectl get pods --all-namespaces -o=jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}{"\n"}'
list images by deploy
kubectl get deploy -o=jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.template.spec.containers[*]}{.image}{", "}{end}{end}{"\n"}'
metricss
wget "$(kubectl config view -o jsonpath='{range .clusters[*]}{@.cluster.server}{"\n"}{end}')"
Practices and Guidlines
- Do not use replication controllers, instead use replica sets
Cgroup / slice errors
https://github.com/kubernetes/kubernetes/issues/56850
log message:
Sep 18 21:32:37 ip-10-10-37-50 kubelet[1681]: E0918 21:32:37.901058 1681 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
MAAS ubuntu
https://stripe.com/blog/operating-kubernetes
https://medium.com/@adriaandejonge/moving-from-docker-to-rkt-310dc9aec938
https://coreos.com/rkt/docs/latest/rkt-vs-other-projects.html#rkt-vs-docker
Security
Todo / read:
- https://github.com/aquasecurity/kube-hunter/blob/master/README.md
- https://www.arctiq.ca/events/2018/10/5/building-a-secure-container-strategy-with-aqua-security-microsoft-azure-and-hashicorp-vault/
References and Reading
- Replica set versus Replication controller
- https://www.mirantis.com/blog/kubernetes-replication-controller-replica-set-and-deployments-understanding-replication-options/
- Publishing services - service types
- https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
- Kuberenets the hard way
- https://github.com/kelseyhightower/kubernetes-the-hard-way
HPA broken
Blue is test
Blue env:
Client Version: v1.12.2 Server Version: v1.10.6
Prod env:
Client Version: v1.12.2 Server Version: v1.9.8
In prod HPAs work. When I ask for them I see:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE adjust Deployment/adjust 0%/70% 1 5 1 1d web-admin Deployment/web-admin 0%/70% 1 3 1 2h
In blue env they don't work, I see:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE adjust Deployment/adjust <unknown>/70% 1 5 1 1d web-admin Deployment/web-admin <unknown>/70% 1 3 1 2h
in Kubernetes events we see:
HorizontalPodAutoscaler Warning FailedGetResourceMetric horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Note that the metrics server is running in kube-system, but there are no repo files for that in /third-party" in prod.
In blue we store all metrics-server related files in /thirdpary/metrics-server ( taken from git@github.com:kubernetes-incubator/metrics-server.git )
In prod the deployment has:
- command: - /metrics-server - --source=kubernetes.summary_api:''
In blue this seemed to do the trick
- /metrics-server - --kubelet-preferred-address-types=InternalIP - --kubelet-insecure-tls
Cluster scaling
ks get configmap cluster-autoscaler-status -o yaml
Steps to move hardware around
in this case we are removing the last node from an instance group and then removing the instance group.
Reference: https://kubernetes.io/docs/concepts/architecture/nodes/
1. Cordon the node
k cordon ip-xx-xx-xx-xx.region.compute.internal
No new pods will be deployed here.
2. drain ( move pods here to somewhere else )
k drain ip-xx-xx-xx-xx.region.compute.internal
You may need to add "--ignore-daemonsets" if you have daemonsets running ( data dog , localredis )
You may ned to "--delete-local-data" if you have a metrics server on this node. BE CAREFUL. You will loose metrics, but probably you have an "out of cluster" place where metrics are stored ( datadog, elastic search, etc )
3. remove the nodegroup from the autoscaler:
ks edit deploy cluster-autoscaler
4. tell kops to delete the instance group.
kops delete ig myig
Also See
kops - automated kubenetes cluster build.