ceph的mon集群实现了paxos提供一致性访问。最近碰到一个问题是3副本的mon集群,其中一个peon mon发生时钟偏移导致slow request,进而引发osd mark的严重故障。大规模虚拟机业务告警。

故障复现

osd每300s需要向mon上报自身的存活状态,也就是发送beacon给mon,beacon发送到mon一直卡住,如下:

        {
            "description": "osd_beacon(pgs [2.248,2.80,2.288,2.360,2.110,3.60,2.268,2.2e0,2.b8,2.2f1,2.19,2.89,2.91,2.331,2.249,2.161,2.d2,3.5a,2.12a,2.13a,2.21a,2.342,2.23a,2.362,2.243,2.3a3,2.83,2.3eb,3.b,2.22b,2.7b,2.2f3,2.3c,3.3c,2.36c,2.74,3.15,2.2c5,2.38d,2.345,2.
335,2.1a5,5.5,2.3a6,2.1e6,3.e,2.3ae,2.386,2.1ce,2.286,2.3be,3.5e,3.5f,2.57,2.267,2.1f,2.39f,2.8f,2.c7,2.237] lec 40508 last_purged_snaps_scrub 2023-03-09T11:23:31.503631+0000 v40509)",
            "initiated_at": "2023-06-05T12:38:00.003962+0000",
            "age": 291.61921003200001,
            "duration": 291.61935419600002,
            "type_data": {
                "events": [
                    {
                        "time": "2023-06-05T12:38:00.003962+0000",
                        "event": "initiated"
                    },
                    {
                        "time": "2023-06-05T12:38:00.003961+0000",
                        "event": "throttled"
                    },
                    {
                        "time": "2023-06-05T12:38:00.003962+0000",
                        "event": "header_read"
                    },
                    {
                        "time": "2023-06-05T12:38:00.003964+0000",
                        "event": "all_read"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004099+0000",
                        "event": "dispatched"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004101+0000",
                        "event": "mon:_ms_dispatch"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004101+0000",
                        "event": "mon:dispatch_op"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004101+0000",
                        "event": "psvc:dispatch"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004111+0000",
                        "event": "osdmap:wait_for_readable"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004111+0000",
                        "event": "osdmap:wait_for_readable/paxos"
                    },
                    {
                        "time": "2023-06-05T12:38:00.004118+0000",
                        "event": "paxos:wait_for_readable"
                    }
                ],
                "info": {
                    "seq": 3208527,
                    "src_is_mon": false,
                    "source": "osd.19 v2:10.133.17.70:6824/44",
                    "forwarded_to_leader": false
                }

卡住位置为paxos:wait_for_readable

代码逻辑

PaxosService::dispatch(MonOpRequestRef op)
...
   if (!is_readable(m->version)) {
     dout(10) << " waiting for paxos -> readable (v" << m->version << ")" << dendl;
     wait_for_readable(op, new C_RetryMessage(this, op), m->version);
     return true;
   }
...

 bool Paxos::is_readable(version_t v)
 {
...
     ret =
       (mon->is_peon() || mon->is_leader()) &&
       (is_active() || is_updating() || is_writing()) &&
       last_committed > 0 && is_lease_valid(); // must have a value alone, or have lease
   dout(5) << __func__ << " = " << (int)ret
           << " - now=" << ceph_clock_now()
           << " lease_expire=" << lease_expire
           << " has v" << v << " lc " << last_committed
           << dendl;
   return ret;
 }

 bool Paxos::is_lease_valid()
 {
   return ((mon->get_quorum().size() == 1)
           || (ceph::real_clock::now() < lease_expire));
 }

在is_lease_valid的检验中,由于本地时钟发生偏移,本地时间比leader mon同步给peon mon的lease_expire时间大,导致时钟偏移的mon认为此时paxos是不可读状态,请求会被放入wait_for_readable队列等待。最终引发故障。

结论

mon时钟偏移会导致该mon无法处理请求,引发slow request,如果是osd的beacon请求,则会导致osd被mark down。默认leader mon每次续租lease_expire是在当前时间基础上+5s,所以peon mon时钟偏移超过5s出发paxos不可读。

发表评论

邮箱地址不会被公开。 必填项已用*标注