Титла: Странен проблем с xfs или bad sectors?! Публикувано от: amarth в Aug 10, 2011, 19:45 Здравейте,
проверявам си аз какво има в /raid0/mysql/ и ето на какво се натъкнах: serv07:~# ls -la /raid0/mysql/ ls: reading directory /raid0/mysql/: Input/output error total 0 , а във въпросната директория имаше няколко бази. малко инфо за машината: Debian 5.0.1, kernel 2.6.20.2 serv07:~# fdisk -l Disk /dev/hda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xae7a3a28 Device Boot Start End Blocks Id System /dev/hda1 1 1246 10008463+ fd Linux raid autodetect /dev/hda2 1247 1369 987997+ 82 Linux swap / Solaris /dev/hda3 1370 19457 145291860 fd Linux raid autodetect Disk /dev/hdc: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/hdc1 1 1246 10008463+ fd Linux raid autodetect /dev/hdc2 1247 1369 987997+ 82 Linux swap / Solaris /dev/hdc3 1370 19457 145291860 fd Linux raid autodetect Disk /dev/md3: 297.5 GB, 297557557248 bytes 2 heads, 4 sectors/track, 72645888 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md2: 10.2 GB, 10248585216 bytes 2 heads, 4 sectors/track, 2502096 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 serv07:~# cat /etc/fstab proc /proc proc defaults 0 0 /dev/md2 / xfs defaults 0 1 /dev/md3 /raid0 xfs defaults 0 2 /dev/hda2 none swap sw 0 0 /dev/hdc2 none swap sw 0 0 serv07:~# mdadm --detail /dev/md3 /dev/md3: Version : 00.90 Creation Time : Fri Nov 17 19:09:14 2006 Raid Level : raid0 Array Size : 290583552 (277.12 GiB 297.56 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 3 Persistence : Superblock is persistent Update Time : Wed Aug 10 16:53:23 2011 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K UUID : e047348c:50fcbfd0:2636a61a:93c132ec Events : 0.15 Number Major Minor RaidDevice State 0 3 3 0 active sync /dev/hda3 1 22 3 1 active sync /dev/hdc3 резултата от checka на md3: serv07:~# xfs_check /dev/md3 can't read btree block 1/14884 can't read block 0 for directory inode 139741471 no . entry for directory 139741471 no .. entry for directory 139741471 /usr/sbin/xfs_check: line 28: 4107 Segmentation fault xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1 пробвах и с repair но нямаше ефек :( serv07:~# xfs_repair /dev/md3 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... xfs_repair: read failed: Input/output error fatal error -- can't read btree block 1/14884 Райда си работи, проблема е само с тази директория, т.е. всички дани в /raid0/* са ок с изключение на тези в /raid0/mysql/ Ако някой има идеи как мога да възстановя това което се намира в /raid0/mysql/ ще съм му много благодарен. Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: vyrgozunqk в Aug 10, 2011, 20:41 Най-лесният ( но не много точен вариант ) е да погледнеш инфото от смарт-а на диска... Да видим тогава какво ще каже... Но да на лоши сектори мирише.
Отделно не съм запознат с XFS, но ако fsck го поддържа, можеш да пробваш да го пуснеш, обикновено пищи на лоши сектори и предлага даже да пробва, да ги презапише, ако не успее да презапише блока диска си ги релокира... Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: aleximilian в Aug 10, 2011, 23:14 Преди известно време имах едни подобни мелодрами.
Ако можеш да направиш копие на диска с dd или dd_rescue и после виж какво ще успее да ти помогне testdisk. Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: amarth в Aug 11, 2011, 01:47 fsck не се поддържа, ето и резултата от смарт:
serv07:~# smartctl -a /dev/hdc smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3160023A Serial Number: 5JS3RRW0 Firmware Version: 8.01 User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Thu Aug 11 01:15:57 2011 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 111) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 054 053 006 Pre-fail Always - 4393832 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 47 7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 86523270 9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 20049 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 134 194 Temperature_Celsius 0x0022 037 059 000 Old_age Always - 37 195 Hardware_ECC_Recovered 0x001a 054 052 000 Old_age Always - 4393832 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 10 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 10 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 2 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 017 170 000 Old_age Always - 83 SMART Error Log Version: 1 ATA Error Count: 2890 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2890 occurred at disk power-on lifetime: 20044 hours (835 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 91 c6 79 e0 Error: UNC 8 sectors at LBA = 0x0079c691 = 7980689 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 91 c6 79 e0 00 20:31:43.955 READ DMA EXT 25 00 08 91 c6 79 e0 00 20:31:39.968 READ DMA EXT 25 00 10 89 e1 64 e0 00 20:31:39.954 READ DMA EXT 25 00 68 19 e1 64 e0 00 20:31:39.951 READ DMA EXT 25 00 80 99 e0 64 e0 00 20:31:39.948 READ DMA EXT Error 2889 occurred at disk power-on lifetime: 20044 hours (835 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 91 c6 79 e0 Error: UNC 8 sectors at LBA = 0x0079c691 = 7980689 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 91 c6 79 e0 00 20:31:39.884 READ DMA EXT 25 00 10 89 e1 64 e0 00 20:31:39.968 READ DMA EXT 25 00 68 19 e1 64 e0 00 20:31:39.954 READ DMA EXT 25 00 80 99 e0 64 e0 00 20:31:39.951 READ DMA EXT 25 00 80 19 e0 64 e0 00 20:31:39.948 READ DMA EXT Error 2888 occurred at disk power-on lifetime: 20044 hours (835 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 b9 9d 65 e0 Error: UNC 8 sectors at LBA = 0x00659db9 = 6659513 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 b9 9d 65 e0 00 20:31:31.108 READ DMA EXT ea 00 00 00 00 00 00 00 20:31:31.099 FLUSH CACHE EXIT 35 00 08 bf 6e 31 e0 00 20:31:31.091 WRITE DMA EXT ea 00 00 00 00 00 00 00 20:31:31.084 FLUSH CACHE EXIT 25 00 08 b9 9d 65 e0 00 20:31:31.084 READ DMA EXT Error 2887 occurred at disk power-on lifetime: 20044 hours (835 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 b9 9d 65 e0 Error: UNC 8 sectors at LBA = 0x00659db9 = 6659513 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 b9 9d 65 e0 00 20:31:31.108 READ DMA EXT 25 00 08 09 cc 64 e0 00 20:31:31.099 READ DMA EXT 25 00 08 71 f3 64 e0 00 20:31:31.091 READ DMA EXT 25 00 08 11 cc 64 e0 00 20:31:31.084 READ DMA EXT 25 00 08 e9 cf 64 e0 00 20:31:31.084 READ DMA EXT Error 2886 occurred at disk power-on lifetime: 20044 hours (835 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 91 c6 79 e0 Error: UNC 8 sectors at LBA = 0x0079c691 = 7980689 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 91 c6 79 e0 00 20:15:51.154 READ DMA EXT ea 00 00 00 00 00 00 00 20:15:51.150 FLUSH CACHE EXIT 35 00 08 bf 6e 31 e0 00 20:15:51.147 WRITE DMA EXT ea 00 00 00 00 00 00 00 20:15:51.141 FLUSH CACHE EXIT 25 00 08 91 c6 79 e0 00 20:15:51.138 READ DMA EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. и от /dev/sda: serv07:~# smartctl -a /dev/hda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen ... Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 063 054 006 Pre-fail Always - 155371110 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 4 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 312341655 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 5072 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 185 194 Temperature_Celsius 0x0022 031 060 000 Old_age Always - 31 195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age Always - 155371110 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 2 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 ... Error 16 occurred at disk power-on lifetime: 5045 hours (210 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 d4 25 2c e0 Error: UNC at LBA = 0x002c25d4 = 2893268 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 3f 25 2c e0 00 00:12:26.047 READ DMA EXT 25 00 00 3f 21 2c e0 00 00:12:26.020 READ DMA EXT 25 00 00 3f 1d 2c e0 00 00:12:26.427 READ DMA EXT 25 00 00 3f 19 2c e0 00 00:12:26.400 READ DMA EXT 25 00 00 3f 15 2c e0 00 00:12:26.374 READ DMA EXT Error 15 occurred at disk power-on lifetime: 4998 hours (208 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 b5 75 46 e0 Error: UNC at LBA = 0x004675b5 = 4617653 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ae 75 46 e0 00 14:09:14.685 READ DMA EXT 25 00 10 a6 75 46 e0 00 14:09:14.681 READ DMA EXT 25 00 08 9e 75 46 e0 00 14:09:14.680 READ DMA EXT 25 00 40 9e f3 46 e0 00 14:09:14.676 READ DMA EXT 25 00 08 4e 75 46 e0 00 14:09:14.669 READ DMA EXT Error 14 occurred at disk power-on lifetime: 4998 hours (208 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 b5 75 46 e0 Error: UNC at LBA = 0x004675b5 = 4617653 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 10 a6 75 46 e0 00 14:09:14.685 READ DMA EXT 25 00 08 9e 75 46 e0 00 14:09:14.681 READ DMA EXT 25 00 40 9e f3 46 e0 00 14:09:14.680 READ DMA EXT 25 00 08 4e 75 46 e0 00 14:09:14.676 READ DMA EXT 25 00 08 3e 75 46 e0 00 14:09:14.669 READ DMA EXT Error 13 occurred at disk power-on lifetime: 4998 hours (208 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 b5 75 46 e0 Error: UNC at LBA = 0x004675b5 = 4617653 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ae 75 46 e0 00 14:08:46.386 READ DMA EXT 25 00 10 a6 75 46 e0 00 14:08:42.163 READ DMA EXT 25 00 18 9e 75 46 e0 00 14:08:42.150 READ DMA EXT 25 00 78 b7 a0 63 e0 00 14:08:42.149 READ DMA EXT 25 00 38 e6 3d 46 e0 00 14:08:42.142 READ DMA EXT Error 12 occurred at disk power-on lifetime: 4998 hours (208 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 b5 75 46 e0 Error: UNC at LBA = 0x004675b5 = 4617653 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 10 a6 75 46 e0 00 14:08:46.386 READ DMA EXT 25 00 18 9e 75 46 e0 00 14:08:42.163 READ DMA EXT 25 00 78 b7 a0 63 e0 00 14:08:42.150 READ DMA EXT 25 00 38 e6 3d 46 e0 00 14:08:42.149 READ DMA EXT 25 00 08 de 3d 46 e0 00 14:08:42.142 READ DMA EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Не мисля че dd_rescue може да ми помогне понеже въпросната директория /raid0/mysql/ е в raid0. Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: Naka в Aug 11, 2011, 10:54 Хa... ти уби коня с база данни върху raid 0 !
този диск ST3160023A го изхърляй. Тези двата реда показват че има 20 бад сектора. (10 от единият ред и още 10 при вторият) Код: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 10 а този ред: Код: 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 47 показва че е имало преди още 47 лоши сектора, които диска е успял да ги замени с здрави. Диска отдавна е почнал да си заминава а и е на 20000часа. Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: amarth в Aug 11, 2011, 12:10 A дали има някакъв начин да се възстановят данните?
Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: Naka в Aug 11, 2011, 15:03 Не.
Там където не ще да чете (където някой файл попада върху бад сектор) няма как - информацията е вече загубена. Но тези файлове които попадат върху здрави сектори могът да се архивират или копират на друго място (друг носител извън масива). Единственото спасение е да копираш всички файлове които все още може да се прочетат на някой друг хард или да ги архивираш. ----------------- Чудя се как не ти е изхвърлил още масива. Ама не знам какво е поведението при райд 0. Може при райд0, при лоши сектори да не се изхърля диска, щото тогава оставаш съвсем без данни. ??? Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: vyrgozunqk в Aug 11, 2011, 17:29 Пробвай да отвориш диска през "mc", копира побитово до колкото помня и просто ще прескочи "сбърканите битчета" в/у кофтито сектори, така съм спасявал няколко пъти снимки и други неща, макар и после да не се рендират изцяло някой от снимките примерно, а до половината... Нищо не пречи да пробваш...
Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: laskov в Aug 11, 2011, 22:16 http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html ($2)
Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: amarth в Aug 12, 2011, 00:56 Нямаше успех с "mc"... преди малко пуснах ddrescue и сутринта ще видим какъв е резултата.
Титла: Re: Странен проблем с xfs или bad sectors?! Публикувано от: amarth в Aug 15, 2011, 14:44 Направих копие на /dev/md3 с dd_rescue, а след това пробвах да възстановя данните с testdisk, photorec и mondo rescue...
за жалост се оказа безуспешно :( |