当前位置:网站首页>[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash
[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash
2022-08-11 00:50:00 【North Asia Data Recovery】
Server data recovery environment:
Seven of the eight SAS hard disks form a RAID5 array, and one is used as a hot spare.
Server failure:
Two hard disks of the RAID5 array in the storage of the failed server are damaged and offline, and the RAID5 array is paralyzed, which affects the upper-layer LUNs and cannot be used normally.The administrator contacted our data recovery center for data recovery, and the hardware engineer detected no physical failure or bad sectors on the hard disk.
Server data recovery process:
1. Backup data.Back up all disk images using a data recovery tool.

2. Analyze the RAID structure.
The LUNs of the failed server are all based on RAID. It is necessary to analyze the underlying RAID information first, and then reconstruct the original RAID based on the RAID-related information obtained by the analysis.Through analysis, it is known that disk No. 4 is a hot spare disk.Analyze the distribution of Oracle database pages in each disk to obtain important information of the RAID group, such as the stripe size of the RAID group, disk order and data direction.
3. Analyze the RAID disconnected disk.
Using the RAID information obtained by analysis, the original RAID is drawn out through the RAID virtual program independently developed by Beiya.Carefully analyze the data in each hard disk, verify the stripes through the RAID verification program independently developed by Beiya, and remove the hard disks that are disconnected first from the raid.
4. Analyze the LUN information in the RAID group.
After virtualizing the latest status of RAID, analyze LUN allocation in RAID and LUN allocated data block MAP.You only need to extract the data block distribution MAP of the bottom six LUNs, and then write a corresponding program based on this information to analyze the data MAP of all LUNs, and export the data of all LUNs according to the data MAP.

5. Parse the LVM logical volume.
After analyzing all generated LUNs, it is found that all LUNs contain HP-Unix LVM logical volume information.I tried to parse the LVM information in each LUN and found that there are three sets of LVMs: one set of LVMs is divided into an LV to store data on the OA server, and the other set of LVMs is divided into an LV to store temporary backup data
.The other four LUNs form a set of LVM and are divided into one LV to store Oracle database files.The Beiya data recovery engineer wrote a program to interpret LVM and tried to interpret the LV volume in each LVM, but the interpreter made an error.
6. Repair the LVM logical volume.
Carefully analyze the cause of the error, and the developer engineer will debug the location of the error and the senior file system engineer will detect the recovered LUN to detect whether the information of the LMV logical volume is damaged due to storage paralysis.After careful inspection, it was found that the storage paralysis did indeed lead to the corruption of LVM information.Try to repair the damaged area manually, and modify the LVM interpreter to reparse the LVM logical volume.
7. Parse the VXFS file system.
Set up HP-Unix environment and map the interpreted LV volume to HP-Unix, try Mount file system.As a result, an error occurred in the Mount file system. Try to use the "fsck -F vxfs" command to repair the vxfs file system. After the repair is completed, the vxfs file system cannot be mounted. It is suspected that some metadata of the underlying vxfs file system is damaged and needs to be repaired manually.
8. Repair the VXFS file system.
Carefully analyze the parsed LV, and check whether the file system is complete according to the underlying structure of the VXFS file system.After analysis, it is found that there is a problem with the underlying VXFS file system. The reason is that the file system is performing IO operations when the storage is paralyzed. Therefore, some file system meta files are not updated and cause damage.Manual repair of these corrupted metafiles allows the VXFS filesystem to parse normally.Mount the repaired LV volume to the HP-Unix computer again, and try to mount the file system without error, and the mount is successful.
9, restore all user files.
Backup all user data to the specified disk space after mounting the file system on HP-Unix machines.Screenshots of some file directories are as follows:

10. Check whether the database file is complete.
Use the Oracle database file detection tool "dbv" to check whether each database file is complete, and no errors are found.Using the Oracle database detection tool independently developed by Beiya, it was found that some database files and log files were inconsistent in the verification. The database engineer repaired such files and verified them again until all files were verified completely.
11. Start the Oracle database.
Since the HP-Unix environment provided by our data recovery center does not have this version of the Oracle database, we coordinate with the user to bring the original environment to the North Asia data recovery center, and then attach the restored Oracle database to the HP-Unix server and try to start the Oracle database, the startup is successful.Some screenshots are as follows:

12. Data verification.
The user side cooperates to start the Oracle database, start the OA server, and install the OA client on the local computer.The latest data records and historical data records are verified through the OA client, and personnel from different departments are arranged for remote verification.The final data verification is correct, the data is complete, and the data recovery is successful.
Data recovery conclusion:
Since the environment is good for saving the site after the failure, and no related dangerous operations are done, it is of great help to the later data recovery.Although many technical bottlenecks were encountered during the entire data recovery process, they were all resolved one by one.Finally, the entire data recovery was completed within the expected time, and the recovered data users were also quite satisfied. All services such as Oracle database service and OA server could be started normally.
边栏推荐
- Which foreign language journals and conferences can be submitted for software engineering/system software/programming language?
- 虚拟电厂可视化大屏,深挖痛点精准减碳
- [Excel knowledge and skills] Convert "false" date to "true" date format
- dump_stack ()
- 构建资源的弹性伸缩
- 云原生-VMware虚拟机安装Kubesphere实战(一)
- Web APIs BOM- 操作浏览器之综合案例
- WebView2 通过 PuppeteerSharp 实现RPA获取壁纸 (案例版)
- HW-常见攻击方式和漏洞原理(2)
- 分库分表ShardingSphere-JDBC笔记整理
猜你喜欢
随机推荐
Kunpeng compilation and debugging and basic knowledge of native development tools
容器技术真的是环境管理的救星吗?
复制带随机指针的链表——LeetCode
关于编程本质那些事
C# JObject解析JSON数据
[Excel knowledge and skills] Convert "false" date to "true" date format
apache+PHP+MySQL+word press,安装word press时页面报错?
Software protection scenario of NOR FLASH flash memory chip ID application
【C语言】探索数据的存储(整形篇)
Analysis of LENS CRA and SENSOR CRA Matching Problems
The SAP ABAP JSON format data processing
Go项目配置管理神器之viper使用详解
【pypdf2】合并PDF、旋转、缩放、裁剪、加密解密、添加水印
2022.8.10-----leetcode.640
关于科研学习中的几个问题:如何看论文?如何评价工作?如何找idea?
Linux安装redis数据库
数据分析面试手册《统计篇》
【openpyxl】过滤和排序
MSTP——多生成树(案列+配置)
【openpyxl】只读模式、只写模式









