btrfs raid1 下 掉盘+元数据暴满处理
root@localhost:~# btrfs fi usage -T /mnt/btrfs_practice/
WARNING: failed to get device size for <missing disk>: No such file or directory
Overall:
Device size: 1.50GiB
Device allocated: 1022.00MiB
Device unallocated: 514.00MiB
Device missing: 1.00GiB
Device slack: 0.00B
Used: 672.43MiB
Free (estimated): 348.16MiB (min: 348.16MiB)
Free (statfs, df): 91.16MiB
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: no
Data Metadata System
Id Path RAID1 RAID1 RAID1 Unallocated Total Slack
-- -------------- --------- -------- -------- ----------- --------- -----
1 <missing disk> 426.25MiB 76.75MiB 8.00MiB 513.00MiB 1.00GiB -
2 /dev/loop2 426.25MiB 76.75MiB 8.00MiB 1.00MiB 512.00MiB -
-- -------------- --------- -------- -------- ----------- --------- -----
Total 426.25MiB 76.75MiB 8.00MiB 514.00MiB 1.50GiB 0.00B
Used 335.09MiB 1.11MiB 16.00KiB
无法直接replace掉
root@localhost:~# btrfs replace start -B 1 /dev/loop3 /mnt/btrfs_practice/
Performing full device TRIM /dev/loop3 (1.00GiB) ...
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/btrfs_practice/": No space left on device
操作:
基于/dev/loop3 创建一个新btrfs,snapshot /mnt/btrfs_practice 为/mnt/btrfs_practice/backup
btrfs send /mnt/btrfs_practice/backup | btrfs receive /tmp/loop3
btrfs subvolume set-default /tmp/loop3/backup
然后重新mount loop3就可以了
fix block size
┌────────────┬───────────────────────────────────────────────────────────────┐
│ 项目 │ 值 │
├────────────┼───────────────────────────────────────────────────────────────┤
│ 超级块记录 │ 1073741824 bytes(1 GiB) │
├────────────┼───────────────────────────────────────────────────────────────┤
│ 实际设备 │ 1072693248 bytes(少 1 MiB) │
├────────────┼───────────────────────────────────────────────────────────────┤
│ 故障 │ total_bytes should be at most 1072693248 but found 1073741824 │
└────────────┴───────────────────────────────────────────────────────────────┘
root@localhost:~# mount /dev/mapper/dust_btrfs /mnt/btrfs_practice/
mount: /mnt/btrfs_practice: wrong fs type, bad option, bad superblock on /dev/mapper/dust_btrfs, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
# dmesg...
[12387.417768] BTRFS info (device dm-1): using crc32c (crc32c-lib) checksum algorithm
[12387.425092] BTRFS error (device dm-1): device total_bytes should be at most 1072693248 but found 1073741824
[12387.425104] BTRFS error (device dm-1): failed to read chunk tree: -22
[12387.429301] BTRFS error (device dm-1): open_ctree failed: -22
root@localhost:~# btrfs rescue fix-device-size /dev/mapper/dust_btrfs
Fixed device size for devid 1, old size: 1073741824 new size: 1072693248
Fixed super total bytes, old size: 1073741824 new size: 1072693248
Fixed unaligned/mismatched total_bytes for super block and device items
异常断电 zero-log 清理日志树 bad tree block start, mirror 1 want 524288000 have 0
[35709.020914] BTRFS info (device loop1): start tree-log replay
[35709.021153] BTRFS error (device loop1): bad tree block start, mirror 1 want 524288000 have 0
[35709.021317] BTRFS warning (device loop1): failed to read log tree
[35709.026257] BTRFS error (device loop1): open_ctree failed: -5
root@localhost:~# mount -o loop ./btrfs11.img /mnt/btrfs_practice/
mount: /mnt/btrfs_practice: can't read superblock on /dev/loop1.
dmesg(1) may have more information after failed mount system call.
btrfs rescue zero-log
root@localhost:~# btrfs rescue zero-log ./btrfs11.img
checksum verify failed on 524288000 wanted 0x00000000 found 0xb6bde3e4
Couldn't setup log root tree
Clearing log on ./btrfs11.img, previous log_root 524288000, level 0
root@localhost:~# mount -o loop ./btrfs11.img /mnt/btrfs_practice/
35807.682528] loop1: detected capacity change from 0 to 2097152
[35807.695142] BTRFS: device fsid 43f62df9-e3bc-483f-afba-f10edc4e12f0 devid 1 transid 13 /dev/loop1 (7:1) scanned by mount (44264)
[35807.696078] BTRFS info (device loop1): first mount of filesystem 43f62df9-e3bc-483f-afba-f10edc4e12f0
[35807.696270] BTRFS info (device loop1): using crc32c (crc32c-lib) checksum algorithm
[35807.713139] BTRFS info (device loop1): enabling ssd optimizations
[35807.713150] BTRFS info (device loop1): enabling free space tree
bad superblock
root@localhost:~# mount -o loop ./btrfs33.img /mnt/btrfs_practice/
mount: /mnt/btrfs_practice: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
[37848.878398] loop1: detected capacity change from 0 to 2097152
[37848.890849] EXT4-fs (loop1): VFS: Can't find ext4 filesystem
[37848.891709] EXT4-fs (loop1): VFS: Can't find ext4 filesystem
[37848.892328] EXT4-fs (loop1): VFS: Can't find ext4 filesystem
[37848.910358] ISOFS: Unable to identify CD-ROM format.
[37848.910740] FAT-fs (loop1): bogus number of reserved sectors
[37848.910750] FAT-fs (loop1): Can't find a valid FAT filesystem
[37848.911521] hfs: can't find a HFS filesystem on dev loop1
[37848.912326] hfsplus: unable to find HFS+ superblock
[37848.916174] XFS (loop1): Invalid superblock magic number
btrfs rescue super-recover <device>
root@localhost:~# btrfs rescue super-recover ./btrfs33.img
Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: y
Recovered bad superblocks successful
root@localhost:~# mount -o loop ./btrfs33.img /mnt/btrfs_practice/
root@localhost:~# findmnt /mnt/btrfs_practice
TARGET SOURCE FSTYPE OPTIONS
/mnt/btrfs_practice /dev/loop1 btrfs rw,relatime,seclabel,ssd,space_cache=v2,subvolid=5,subvol=/
root@localhost:~#
couldn’t read tree root && open_ctree failed: -5 综合题
dmesg
[38732.871141] loop1: detected capacity change from 0 to 2097152
[38732.882712] BTRFS: device fsid 160086ed-2388-472f-92c6-47ee8e2e1407 devid 1 transid 12 /dev/loop1 (7:1) scanned by mount (56215)
[38732.884333] BTRFS info (device loop1): first mount of filesystem 160086ed-2388-472f-92c6-47ee8e2e1407
[38732.884353] BTRFS info (device loop1): using crc32c (crc32c-lib) checksum algorithm
[38732.890271] BTRFS error (device loop1): bad tree block start, mirror 1 want 33423360 have 0
[38732.890479] BTRFS error (device loop1): bad tree block start, mirror 2 want 33423360 have 0
[38732.890533] BTRFS warning (device loop1): couldn't read tree root
[38732.894913] BTRFS error (device loop1): open_ctree failed: -5
btrfs check –readonly
root@localhost:~# btrfs check --readonly ./btrfs44.img
Opening filesystem to check...
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
bad tree block 33423360, bytenr mismatch, want=33423360, have=0
Couldn't read tree root
ERROR: cannot open file system
btrfs check –repair失败
root@localhost:~# btrfs check --repair ./btrfs44.img
enabling repair mode
WARNING:
Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. E.g.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
bad tree block 33423360, bytenr mismatch, want=33423360, have=0
Couldn't read tree root
ERROR: cannot open file system
root@localhost:~#
使用备份
Starting point selection:
-s|--super <superblock> use this superblock copy
root@localhost:~# btrfs check -s 1 ./btrfs44.img
我的练习环境是1GB,只有两个备份
root@localhost:~# btrfs check -s 1 ./btrfs44.img
using SB copy 1, bytenr 67108864
Opening filesystem to check...
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
bad tree block 33423360, bytenr mismatch, want=33423360, have=0
Couldn't read tree root
ERROR: cannot open file system
使用备份也失败
btrfs restore 失败
root@localhost:~# btrfs restore -i -o ./btrfs44.img ./btrfs44_restored/
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
bad tree block 33423360, bytenr mismatch, want=33423360, have=0
Couldn't read tree root
Could not open root, trying backup super
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 33423360 wanted 0x00000000 found 0xb6bde3e4
bad tree block 33423360, bytenr mismatch, want=33423360, have=0
Couldn't read tree root
Could not open root, trying backup super
ERROR: superblock bytenr 274877906944 is larger than device size 1073741824
Could not open root, trying backup super
btrfs-find-root
root@localhost:~# btrfs-find-root ./btrfs44.img
Couldn't read tree root
Superblock thinks the generation is 12
Superblock thinks the level is 0
Well block 32014336(gen: 11 level: 0) seems good, but generation/level doesn't match, want gen: 12 level: 0
Well block 31571968(gen: 10 level: 0) seems good, but generation/level doesn't match, want gen: 12 level: 0
Well block 30949376(gen: 9 level: 0) seems good, but generation/level doesn't match, want gen: 12 level: 0
Well block 30572544(gen: 8 level: 0) seems good, but generation/level doesn't match, want gen: 12 level: 0
指定tree的位置进行restore
root@localhost:~# btrfs restore -t 32014336 -i -o -v ./btrfs44.img ./btrfs44_restored/
parent transid verify failed on 32014336 wanted 12 found 11
parent transid verify failed on 32014336 wanted 12 found 11
parent transid verify failed on 32014336 wanted 12 found 11
Ignoring transid failure
root@localhost:~# ls ./btrfs44_restored/
Backups Docker Documents Music Photos Thumbnails Videos
小结
逆天到绝望的场景,最终使用restore还是能将数据救出来,感恩!
raid1 missing device 基操
root@localhost:~/btrfs44_restored# btrfs fi usage -T /mnt/btrfs_practice/
WARNING: failed to get device size for <missing disk>: No such file or directory
Overall:
Device size: 2.00GiB
Device allocated: 1.63GiB
Device unallocated: 374.50MiB
Device missing: 1.00GiB
Device slack: 0.00B
Used: 840.77MiB
Free (estimated): 494.55MiB (min: 494.55MiB)
Free (statfs, df): 400.43MiB
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: no
Data Metadata System
Id Path RAID1 RAID1 RAID1 Unallocated Total Slack
-- -------------- --------- --------- -------- ----------- ------- -----
1 <missing disk> 726.38MiB 102.38MiB 8.00MiB 187.25MiB 1.00GiB -
2 /dev/loop1 726.38MiB 102.38MiB 8.00MiB 187.25MiB 1.00GiB -
-- -------------- --------- --------- -------- ----------- ------- -----
Total 726.38MiB 102.38MiB 8.00MiB 374.50MiB 2.00GiB 0.00B
Used 419.07MiB 1.30MiB 16.00KiB
root@localhost:~/btrfs44_restored# btrfs replace start 1 /dev/loop2 /mnt/btrfs_practice/
Performing full device TRIM /dev/loop2 (1.00GiB) ...
root@localhost:~/btrfs44_restored# btrfs replace status /mnt/btrfs_practice/
Started on 3.May 21:04:52, finished on 3.May 21:04:53, 0 write errs, 0 uncorr. read errs
root@localhost:~/btrfs44_restored# btrfs fi usage -T /mnt/btrfs_practice/
Overall:
Device size: 2.00GiB
Device allocated: 1.82GiB
Device unallocated: 188.25MiB
Device missing: 0.00B
Device slack: 0.00B
Used: 840.75MiB
Free (estimated): 434.50MiB (min: 431.84MiB)
Free (statfs, df): 328.43MiB
Data ratio: 1.95
Metadata ratio: 1.48
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: yes (data, metadata, system)
Data Data Metadata Metadata System System
Id Path single RAID1 single RAID1 single RAID1 Unallocated Total Slack
-- ---------- -------- --------- --------- --------- -------- ------- ----------- ------- -----
1 /dev/loop2 - 726.38MiB - 102.38MiB - 8.00MiB 187.25MiB 1.00GiB -
2 /dev/loop1 42.25MiB 726.38MiB 112.00MiB 102.38MiB 32.00MiB 8.00MiB 1.00MiB 1.00GiB -
-- ---------- -------- --------- --------- --------- -------- ------- ----------- ------- -----
Total 42.25MiB 726.38MiB 112.00MiB 102.38MiB 32.00MiB 8.00MiB 188.25MiB 2.00GiB 0.00B
Used 0.00B 419.07MiB 0.00B 1.30MiB 16.00KiB 0.00B
root@localhost:~/btrfs44_restored# btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/btrfs_practice/
Done, had to relocate 9 out of 9 chunks
root@localhost:~/btrfs44_restored# btrfs fi usage -T /mnt/btrfs_practice/
Overall:
Device size: 2.00GiB
Device allocated: 1.50GiB
Device unallocated: 512.00MiB
Device missing: 0.00B
Device slack: 0.00B
Used: 840.83MiB
Free (estimated): 476.93MiB (min: 476.93MiB)
Free (statfs, df): 475.93MiB
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: no
Data Metadata System
Id Path RAID1 RAID1 RAID1 Unallocated Total Slack
-- ---------- --------- -------- -------- ----------- ------- -----
1 /dev/loop2 640.00MiB 96.00MiB 32.00MiB 256.00MiB 1.00GiB -
2 /dev/loop1 640.00MiB 96.00MiB 32.00MiB 256.00MiB 1.00GiB -
-- ---------- --------- -------- -------- ----------- ------- -----
Total 640.00MiB 96.00MiB 32.00MiB 512.00MiB 2.00GiB 0.00B
Used 419.07MiB 1.33MiB 16.00KiB
小结
其实raid1配置下,较好操作,因为数据都在,不用刻意去通过线索去找解决方案
chunk-recover
[41783.759328] loop1: detected capacity change from 0 to 2097152
[41783.771329] BTRFS: device fsid 3da92481-25e6-4e85-9265-dc9fb23b088d devid 1 transid 12 /dev/loop1 (7:1) scanned by mount (64417)
[41783.772925] BTRFS info (device loop1): first mount of filesystem 3da92481-25e6-4e85-9265-dc9fb23b088d
[41783.772947] BTRFS info (device loop1): using crc32c (crc32c-lib) checksum algorithm
[41783.779253] BTRFS error (device loop1): bad tree block start, mirror 1 want 22085632 have 0
[41783.779697] BTRFS error (device loop1): bad tree block start, mirror 2 want 22085632 have 0
[41783.779804] BTRFS error (device loop1): failed to read chunk root
[41783.783596] BTRFS error (device loop1): open_ctree failed: -5
root@localhost:~/btrfs44_restored# btrfs rescue chunk-recover ./btrfs66.img
Scanning: 0 in dev0scan chunk headers error
root@localhost:~/btrfs44_restored# btrfs-find-root ./btrfs66.img
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
WARNING: cannot read chunk root, continue anyway
Superblock thinks the generation is 12
Superblock thinks the level is 0
root@localhost:~/btrfs44_restored# btrfs restore -i -o ./btrfs66.img ./btrfs66_dir
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
ERROR: cannot read chunk root
Could not open root, trying backup super
ERROR: tree block bytenr 0 is not aligned to sectorsize 4096
ERROR: cannot read chunk root
Could not open root, trying backup super
ERROR: superblock bytenr 274877906944 is larger than device size 1073741824
Could not open root, trying backup super