prometheus 监控探索

Prometheus 组件使用

blackbox_exporter

  • 配置可以使用的协议模块 用于获取监控的设备的信息。 默认端口9115
  • 通过 restful 请求获取信息,配置文件里可配置支持各种协议。

alertmanager

  • 配置警报信息,配置上下游相关信息。默认端口9093

举例:配置企业微信的报警通知时,新建部门,应用要配置对应可见部门,否则会报错。参数 to_user 可@all。

prometheus server

  • 主服务用于加载各个服务组件,配置监控规则,监控任务目标。默认端口9090

重新加载配置: POST http://localhost:9090/-/reload

使用

  • 先后启动组件 alert、blackbox、prometheus,浏览 ~:9090/metrics 可查看指标。

Configure a Jenkins pipeline on Kubernetes with Github and Slack

Prerequisites

  • 这里使用的是 a free IBM Cloud account.

    Install the IBM Cloud command-line interface (CLI) to your work station.

  • 本机 Mac 使用 Docker Desktop.

    同时 Create a Docker Hub account.

  • Install a Kubernetes CLI (kubectl) on Mac
  • Install a Git Client.

    Sign up for a GitHub account.

  • Create a Slack account.

Key Procedure

  • 设置 KUBECONFIG 环境变量, 指向 cloud。

  • 验证是否可以连接到集群。
    kubectl version --short

    Client Version: v1.16.1
    Server Version: v1.14.9+IKS

  • 持久化 jenkins_home。因为这里使用的是单节点集群,所以 pv 类型选用的是 hostPath。

    kubectl apply -f jenkins-pv.yaml
    kubectl apply -f jenkins-pvc.yaml
    kubectl apply -f jenkins-deployment.yaml
    kubectl apply -f jenkins-service.yaml

  • 获取 Jenkins dashboard 服务地址
    export EXTERNAL_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address }')

    export NODE_PORT=30100

    echo $EXTERNAL_IP:$NODE_PORT

    184.172.229.55:30100

  • 获取 Jenkins admin 默认密码

    kubectl logs $(kubectl get pods --selector=app=jenkins -o=jsonpath='{.items[0].metadata.name}') jenkins

  • 配置凭据 github、dockerhub、kubeconfig、slack-notification

  • 安装插件:Slack-notification 和Kubernetes Cli Plugin

  • 配置 Jenkins Slack Notification 主要填写 Workspace, Credential。Default channel / member id 可不填,具体可在 Jenkinsfile 配置里指定,比如

    success { slackSend(channel: "#ok", message: "pluckhuang/podinfo:${env.BUILD_NUMBER} Pipeline is successfully completed.")}

Reference and resource

What is ‘Site Reliability Engineering’?

main goals :

Site Reliability Engineering (SRE), The main goals are to create scalable and highly reliable software systems. According to Ben Treynor, founder of Google's Site Reliability Team, SRE is "what happens when a software engineer is tasked with what used to be called operations."[1]

It's also been associated with a practice that encompasses automation of manual tasks, continuous integration and continuous delivery.

SREs, being developers themselves, will naturally bring solutions that help remove the barriers between development teams and operations teams.

difference devops with sre:

SRE and DevOps share the same foundational principles. SRE is viewed by many (as cited in the Google SRE book) as a "specific implementation of DevOps with some idiosyncratic extensions."

from wiki