当前位置：网站首页>[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash

[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash

2022-08-11 00:50:00 【North Asia Data Recovery】

Server data recovery environment:
Seven of the eight SAS hard disks form a RAID5 array, and one is used as a hot spare.

Server failure:
Two hard disks of the RAID5 array in the storage of the failed server are damaged and offline, and the RAID5 array is paralyzed, which affects the upper-layer LUNs and cannot be used normally.The administrator contacted our data recovery center for data recovery, and the hardware engineer detected no physical failure or bad sectors on the hard disk.

Server data recovery process:
1. Backup data.Back up all disk images using a data recovery tool.

2. Analyze the RAID structure.
The LUNs of the failed server are all based on RAID. It is necessary to analyze the underlying RAID information first, and then reconstruct the original RAID based on the RAID-related information obtained by the analysis.Through analysis, it is known that disk No. 4 is a hot spare disk.Analyze the distribution of Oracle database pages in each disk to obtain important information of the RAID group, such as the stripe size of the RAID group, disk order and data direction.

3. Analyze the RAID disconnected disk.
Using the RAID information obtained by analysis, the original RAID is drawn out through the RAID virtual program independently developed by Beiya.Carefully analyze the data in each hard disk, verify the stripes through the RAID verification program independently developed by Beiya, and remove the hard disks that are disconnected first from the raid.

4. Analyze the LUN information in the RAID group.
After virtualizing the latest status of RAID, analyze LUN allocation in RAID and LUN allocated data block MAP.You only need to extract the data block distribution MAP of the bottom six LUNs, and then write a corresponding program based on this information to analyze the data MAP of all LUNs, and export the data of all LUNs according to the data MAP.

5. Parse the LVM logical volume.
After analyzing all generated LUNs, it is found that all LUNs contain HP-Unix LVM logical volume information.I tried to parse the LVM information in each LUN and found that there are three sets of LVMs: one set of LVMs is divided into an LV to store data on the OA server, and the other set of LVMs is divided into an LV to store temporary backup data
.The other four LUNs form a set of LVM and are divided into one LV to store Oracle database files.The Beiya data recovery engineer wrote a program to interpret LVM and tried to interpret the LV volume in each LVM, but the interpreter made an error.

6. Repair the LVM logical volume.
Carefully analyze the cause of the error, and the developer engineer will debug the location of the error and the senior file system engineer will detect the recovered LUN to detect whether the information of the LMV logical volume is damaged due to storage paralysis.After careful inspection, it was found that the storage paralysis did indeed lead to the corruption of LVM information.Try to repair the damaged area manually, and modify the LVM interpreter to reparse the LVM logical volume.

7. Parse the VXFS file system.
Set up HP-Unix environment and map the interpreted LV volume to HP-Unix, try Mount file system.As a result, an error occurred in the Mount file system. Try to use the "fsck -F vxfs" command to repair the vxfs file system. After the repair is completed, the vxfs file system cannot be mounted. It is suspected that some metadata of the underlying vxfs file system is damaged and needs to be repaired manually.

8. Repair the VXFS file system.
Carefully analyze the parsed LV, and check whether the file system is complete according to the underlying structure of the VXFS file system.After analysis, it is found that there is a problem with the underlying VXFS file system. The reason is that the file system is performing IO operations when the storage is paralyzed. Therefore, some file system meta files are not updated and cause damage.Manual repair of these corrupted metafiles allows the VXFS filesystem to parse normally.Mount the repaired LV volume to the HP-Unix computer again, and try to mount the file system without error, and the mount is successful.

9, restore all user files.
Backup all user data to the specified disk space after mounting the file system on HP-Unix machines.Screenshots of some file directories are as follows:

10. Check whether the database file is complete.
Use the Oracle database file detection tool "dbv" to check whether each database file is complete, and no errors are found.Using the Oracle database detection tool independently developed by Beiya, it was found that some database files and log files were inconsistent in the verification. The database engineer repaired such files and verified them again until all files were verified completely.

11. Start the Oracle database.
Since the HP-Unix environment provided by our data recovery center does not have this version of the Oracle database, we coordinate with the user to bring the original environment to the North Asia data recovery center, and then attach the restored Oracle database to the HP-Unix server and try to start the Oracle database, the startup is successful.Some screenshots are as follows:

12. Data verification.
The user side cooperates to start the Oracle database, start the OA server, and install the OA client on the local computer.The latest data records and historical data records are verified through the OA client, and personnel from different departments are arranged for remote verification.The final data verification is correct, the data is complete, and the data recovery is successful.

Data recovery conclusion:
Since the environment is good for saving the site after the failure, and no related dangerous operations are done, it is of great help to the later data recovery.Although many technical bottlenecks were encountered during the entire data recovery process, they were all resolved one by one.Finally, the entire data recovery was completed within the expected time, and the recovered data users were also quite satisfied. All services such as Oracle database service and OA server could be started normally.

原网站

版权声明
本文为[North Asia Data Recovery]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/223/202208110046375487.html

当前位置：网站首页>[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash

[Server data recovery] Data recovery case of lvm information and VXFS file system corruption caused by raid5 crash

边栏推荐

猜你喜欢

随机推荐