[监控] - 使用OpenResty对Stream Load监控告警

69人浏览 / 0人评论

在Doris中, 实时导入数据是通过Stream Load来实现的, 通过我们的一些实践发现, stream load频率和每次load的数据量对于Doris服务稳定有极大的影响. 目前doris最新版为0.13.0, 还是建议Doris以微批的方式, 以分钟级间隔通过Stream Load导入. 所以监控和告警就不得不做.

架构图

Grafana: 查看监控数据
Prometheus: 异常指标报警
Doris自身: 统计Stream Load频率, 次数等

OpenResty vs Nginx

OpenResty是基于nginx的一个应用层工具平台, 集成了许多有用的功能, 比如支持直接lua脚本. 更多介绍: https://openresty.org/cn/

哪些监控指标可以监控、告警

1. 每个用户stream load频率
2. 每个表stream load频率
3. 每个db stream load频率
4. 每个stream load耗时、load结果
5. stream load按天等维度的次数统计
6. stream load限速

安装 OpenResty

wget https://openresty.org/package/centos/openresty.repo
sudo mv openresty.repo /etc/yum.repos.d/
sudo yum check-update
sudo yum install -y openresty

默认安装目录: /usr/local/openresty/

配置OpenResty

  • 配置nginx.conf
vim /usr/local/openresty/nginx/conf/nginx.conf

内容如下:

worker_processes  auto;
error_log  /data/logs/nginx/error.log;## 需要修改
events {
    worker_connections  10240;
}

http {
    include           mime.types;
    default_type      application/octet-stream;
    include           /etc/nginx/conf.d/*.conf; ## 需要修改

    sendfile           on;
    keepalive_timeout  65;
    server {
        listen          80;
        server_name     localhost;
        location / {
            root   html;
            index  index.html index.htm;
        }
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}
  • 配置stream load转发 在对应的conf.d/目录中添加配置文件, vim /etc/nginx/conf.d/doris_stream_load.conf
upstream normal_fe {
    server fe ip:fe http端口; ## 修改
}
underscores_in_headers on;
log_format  load_access_log_format  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" '
              '$upstream_response_time $request_time $content_length $http_label $resp_body';


server {
    listen 9001;
    access_log /data/logs/nginx/access.log load_access_log_format; ## 目录修改
    error_log /data/logs/nginx/error.log; ## 目录修改
    client_max_body_size 100000M;
    proxy_connect_timeout 300;
    proxy_send_timeout 300;
    proxy_read_timeout 300;
    send_timeout 300;
    underscores_in_headers on;

    set $resp_body "";
    lua_need_request_body on;
    body_filter_by_lua '
        local resp_body = string.sub(ngx.arg[1], 1, 1000)
        ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
        if ngx.arg[2] then
        ngx.var.resp_body = ngx.ctx.buffered
        end
    ';
    location / {
        proxy_pass http://normal_fe;
        proxy_set_header Expect '100-continue';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_intercept_errors on;
        error_page 301 302 307 = @mirrorredirect;
    }

    location @mirrorredirect {
        set $redirect_uri '$upstream_http_location';
        proxy_pass $redirect_uri;
        proxy_set_header Expect '100-continue';
        proxy_pass_request_body on;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

启动OpenResty

发送一个stream load请求, openresty的访问日志就包含监控指标, 如下一条

1.1.3.65 - admin [10/Dec/2020:22:35:36 +0800] "PUT /api/test/tbl_stream_load_mirror_02/_stream_load HTTP/1.1" 200 447 "-" "curl/7.29.0" "-" 0.003 : 0.032 3.034 390 stream_load_mirror_009 {\x0A    \x22TxnId\x22: 14613236,\x0A    \x22Label\x22: \x22stream_load_mirror_009\x22,\x0A    \x22Status\x22: \x22Success\x22,\x0A    \x22Message\x22: \x22OK\x22,\x0A    \x22NumberTotalRows\x22: 10,\x0A    \x22NumberLoadedRows\x22: 10,\x0A    \x22NumberFilteredRows\x22: 0,\x0A    \x22NumberUnselectedRows\x22: 0,\x0A    \x22LoadBytes\x22: 390,\x0A    \x22LoadTimeMs\x22: 31,\x0A    \x22BeginTxnTimeMs\x22: 1,\x0A    \x22StreamLoadPutTimeMs\x22: 1,\x0A    \x22ReadDataTimeMs\x22: 0,\x0A    \x22WriteDataTimeMs\x22: 14,\x0A    \x22CommitAndPublishTimeMs\x22: 14\x0A}

每个字段和load_access_log_format一一对应, 再把这些数据load进入doris, 我们想要的各种监控数据就有啦. 最后一个json就是stream 返回结果, 我们将: \x0A 替换为 \n \x22 替换为 " 就可以看到解码后正常结果.

将访问日志解析后load进入doris即可统计各种指标作为监控数据

将访问日志解析后推送到prometheus, 即可实现各种指标的报警, 如何配置告警: /article/21

全部评论