Prerequisites 参考文档:
操作系统版本:CentOS 7.6
Cluster Network
Public Network
Cluster Network 用于集群的内部通信
Public Network 用于对外提供服务
配置hosts解析 在ceph-mon1上设置hosts文件
1 2 3 4 5 6 7 8 9 10 11 12 cat >> /etc/hosts << EOF ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd4 ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd4 EOF# 注意,由于ceph的进程会在自己的UNIX socket文件里面把主机名也加进去作为文件名,所以主机名和hosts文件中配置的主机名必须一致
1 2 ssh-keygen -t rsa -P '' for i in `tail -n 4 /etc/hosts | awk '{print $1}'`; do ssh-copy-id $i;done
1 for i in `tail -n 4 /etc/hosts | awk '{print $1}'`; do scp /etc/hosts $i:/etc/ ;done
配置ceph yum源 注意,根据CentOS的版本不同,选择如下两个配置中的其中一个
CentOS 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [Ceph] name =Ceph packages for $basearch baseurl =$basearch enabled =1 gpgcheck =0 type =rpm-md[Ceph-noarch] name =Ceph noarch packagesbaseurl = =1 gpgcheck =0 type =rpm-md[ceph-source] name =Ceph source packagesbaseurl = =1 gpgcheck =0 type =rpm-md
CentOS 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [Ceph] name =Ceph packages for $basearch baseurl =$basearch enabled =1 gpgcheck =0 type =rpm-md[Ceph-noarch] name =Ceph noarch packagesbaseurl = =1 gpgcheck =0 type =rpm-md[ceph-source] name =Ceph source packagesbaseurl = =1 gpgcheck =0 type =rpm-md
复制ceph yum源到其他节点上去
1 for i in `tail -n 4 /etc/hosts | awk '{print $1}' `; do scp /etc/yum.repos.d/ceph.repo $i :/etc/yum.repos.d/;done
安装依赖 安装python3和podman
1 for i in `tail -n 4 /etc/hosts | awk '{print $1}'`; do ssh $i exec yum install python3 podman -y ;done
1 2 3 4 5 6 7 cat > /etc/containers/registries.conf << EOF unqualified-search-registries = [""] [[registry]] prefix = "" location = "" EOF
1 for i in `tail -n 4 /etc/hosts | awk '{print $1}'`; do scp /etc/containers/registries.conf $i:/etc/containers/ ;done
安装cephadm 目前cephadm仍然处于快速迭代状态
1 2 wget chmod a+x cephadm && cp cephadm /usr/bin/cephadm
部署mon节点 在ceph-mon1执行
1 2 3 mkdir -p /etc/ceph cephadm bootstrap --mon-ip 这个过程中,cephadm会自动完成拉取镜像,创建keyring,创建配置文件,部署mgr节点等一系列操作
1 2 3 4 cd /etc/ceph/ ssh-copy-id -f -i root@ceph-mon2 ssh-copy-id -f -i root@ceph-mon3 ssh-copy-id -f -i root@ceph-osd4
1 2 3 4 5 cephadm shell -- ceph orch apply mon --unmanaged# 禁用自动部署mon节点,如果不做这一步,cephadm会在所有已添加的host上去部署mon和mgr进程 cephadm shell -- ceph orch host add ceph-mon2 cephadm shell -- ceph orch host add ceph-mon3 cephadm shell -- ceph orch host add ceph-osd4
1 2 3 4 cephadm shell -- ceph orch host label add ceph-mon1 mon cephadm shell -- ceph orch host label add ceph-mon2 mon cephadm shell -- ceph orch host label add ceph-mon3 mon
1 cephadm shell -- ceph orch apply mon label:mon
1 cephadm shell -- ceph status
部署osd 列出节点上的所有可用设备
1 2 cephadm shell -- ceph orch device ls cephadm shell -- ceph orch device ls
1 2 3 4 5 6 7 8 cephadm shell -- ceph orch daemon add osd ceph-mon1:/dev/sdb cephadm shell -- ceph orch daemon add osd ceph-mon1:/dev/sdc cephadm shell -- ceph orch daemon add osd ceph-mon2:/dev/sdb cephadm shell -- ceph orch daemon add osd ceph-mon2:/dev/sdc cephadm shell -- ceph orch daemon add osd ceph-mon3:/dev/sdb cephadm shell -- ceph orch daemon add osd ceph-mon3:/dev/sdc cephadm shell -- ceph orch daemon add osd ceph-osd4:/dev/sdb cephadm shell -- ceph orch daemon add osd ceph-osd4:/dev/sdc
1 2 cephadm shell -- ceph status cephadm shell -- ceph osd tree
文件系统管理 1 2 3 4 5 6 7 8 9 10 11 12 # 要部署多少个mds,在哪些节点上部署 # 这一步仅仅是部署 ceph orch apply mds myfs --placement="3 ceph-mon1 ceph-mon2 ceph-mon3"# 在指定节点上创建mds进程 ceph orch daemon add mds myfs --placement '1 ceph-mon2 '# 下面这种也行,其中3为此文件系统需要使用的MDS数量 ceph fs volume create myfs 3# 设置一个文件系统最少需要多少个standby mds(冷备mds)
设置MDS热备 1 2 3 4 # 是否为文件系统开启热备功能 ceph fs set <fs name> allow_standby_replay <bool># 默认情况下,mds添加进文件系统之后处于冷备(Standby)状态,即一旦Active mds宕机,Standby mds需要
ORCH进程管理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # 列出当前节点所有正在运行的进程 podman ps docker ps# 查看编排器模块当前的状态,目前支持两个后端,一个是cephadm,一个是rook ceph orch status# 列出所有已被编排器模块管理的主机 ceph orch host ls# 将一个主机添加到编排器的管理范围内 ssh-copy-id -f -i /etc/ceph/ ceph-node02 ceph orch host add ceph-node02# 为一个主机添加标签,此处的hostname必须和目标节点的hostname命令输出一致 ceph orch host label add <HOSTNAME> <label># 将一个主机移除集群,此命令并不会直接停止目标节点上的所有ceph进程,只是将目标节点踢出编排器模块的自动管理范畴内 ceph orch host rm <HOSTNAME># 检查是否能安全的移除一个节点而不会影响集群 ceph orch host ok-to-stop <HOSTNAME># 添加一个进程,主机名必须与目标节点一致,如非必要,不用添加额外参数,只指明主机名即可 # 编排器模块将自动在目标节点上引导新的进程 ceph orch daemon add osd <HOSTNAME>:<DEVICE_NAME> ceph orch daemon add mgr <HOSTNAME> ceph orch daemon add mon <HOSTNAME> ceph orch daemon add rgw <HOSTNAME># 删除一个进程,进程名DAEMON_NAME通常是 TYPE+ID,例如mon.ceph-node01 ceph orch daemon rm <DAEMON_NAME># 重启一个进程 ceph orch daemon restart <DAEMON_NAME>
MGR管理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 # 列出当前MGR正在运行哪些Service # 可以用于获取Dashboard的访问地址 ceph mgr services# 用户账号权限相关 dashboard ac-role-add-scope-perms <rolename> <scopename> <permissions>... Add the scope permissions for a role dashboard ac-role-create <rolename> [<description>] Create a new access control role dashboard ac-role-del-scope-perms <rolename> <scopename> Delete the scope permissions for a role dashboard ac-role-delete <rolename> Delete an access control role dashboard ac-role-show [<rolename>] Show role info dashboard ac-user-add-roles <username> <roles>... Add roles to user dashboard ac-user-create <username> [<password>] [<rolename>] [<name>] Create a user dashboard ac-user-del-roles <username> <roles>... Delete roles from user dashboard ac-user-delete <username> Delete user dashboard ac-user-disable <username> Disable a user dashboard ac-user-enable <username> Enable a user dashboard ac-user-set-info <username> <name> <email> Set user info dashboard ac-user-set-password <username> <password> [--force-password] Set user password dashboard ac-user-set-password-hash <username> <hashed_password> Set user password bcrypt hash dashboard ac-user-set-roles <username> <roles>... Set user roles dashboard ac-user-show [<username>] Show user info# RBAC相关 # 列出当前Dashboard的所有Role ceph dashboard ac-role-show# 查看一个role具有的权限 ceph dashboard ac-role-show administrator# 添加一个role ceph dashboard ac-role-create rbd-image-manager# 为一个role设置权限,注意,如果要给不同的作用域设置权限,只能分开写多次 ceph dashboard ac-role-add-scope-perms rbd-image-manager rbd-image create delete read update ceph dashboard ac-role-add-scope-perms rbd-image-manager pool read# 为一个role移除权限 ceph dashboard ac-role-del-scope-perms rbd-image-manager dashboard-settings# 列出当前Dashboard所有的用户 ceph dashboard ac-user-show# 查看指定用户的详细信息 ceph dashboard ac-user-show <USERNAME># 创建用户并设置密码 ceph dashboard ac-user-create rbd-user Vhwwls123.# 将指定的Role绑定到用户上 ceph dashboard ac-user-add-roles rbd-user rbd-image-manager# 更改用户的密码 ceph dashboard ac-user-set-password rbd-user Vhwwls123.
Pg auto-scaler 配置 1 2 3 4 5 6 7 8 9 10 11 12 13 ceph osd pool autoscale-status# 查看所有存储池的pg autoscale状态 ceph osd pool set pg_autoscale_mode off# 设置单个池的pg autoscale 规则 # on表示一直开启 # off表示关闭 # warn 表示只有PG的数量触发了警告(过多或者过少)才进行自动伸缩 ceph config get mon osd_pool_default_pg_autoscale_mode # 查看当前默认pg autoscale策略 ceph config set mon osd_pool_default_pg_autoscale_mode warn# 设置默认的pg autoscale策略
日志管理 1 2 3 # 为某类进程开启最详细的日志 ceph tell mds.* injectargs '--debug_mds 20/20'