logo       

RAID5 won't start: msg#00218

linux.raid

Subject: RAID5 won't start

[sorry about the crappy formating - my real mail system is on the failed
array, and I'm forced to use the job email - juck!]

I'll try to recreate my steps in getting this problem...

When I built my external RAID cabinet, I was lacking a disk bracket
(what Sun calls drive spuds -
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&category=20328&item=5726798994&rd=1).
So instead I used cardboard to separate the disks. This have worked just
fine (roughly 3-4 months).
Today I received my replacement spuds, and I thought I mount it on to the
disks.

Removed the disk (mdadm md1 -f sdd1 -r sdd1), and then later replaced it
again in the exact same location (mdadm md1 -a sdd1)...

It took about half hour to sync again. I had a oneliner that looked at the
output from "mdadm -D md1 | grep 'whatever the string was'" (how far
it
got rebuilding). A couple of seconds/minutes (don't know exactly, I had
other things on my mind in another window :) it reached 99% it seemed to
have hung. Cat'ing /proc/mdstat also hung...

Starting a serial console, I saw a lot of stuff but what catched my eyes
was that it had finished syncing the md1 array.... Since I was in a rush,
i cykled the power (I know - DUMB!). Now it won't start the array...


----- s n i p -----
Number Major Minor RaidDevice State
0 0 0 -1 removed
1 8 81 1 active sync
/dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync
/dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync
/dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync
/dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync
/dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync
/dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync
/dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
9 8 49 -1 spare
/dev/scsi/host3/bus0/target4/lun0/part1

sdf1 /dev/scsi/host3/bus0/target8/lun0/part1: device 1 in 9 device
active raid5 md1.
sdg1 /dev/scsi/host3/bus0/target9/lun0/part1: device 2 in 9 device
active raid5 md1.
sdp1 /dev/scsi/host4/bus0/target4/lun0/part1: device 3 in 9 device
active raid5 md1.
sdq1 /dev/scsi/host4/bus0/target5/lun0/part1: device 4 in 9 device
active raid5 md1.
sdr1 /dev/scsi/host4/bus0/target8/lun0/part1: device 5 in 9 device
active raid5 md1.
sds1 /dev/scsi/host4/bus0/target9/lun0/part1: device 6 in 9 device
active raid5 md1.
sdx1 /dev/scsi/host4/bus0/target14/lun0/part1: device 7 in 9 device
active raid5 md1.
sdd1 /dev/scsi/host3/bus0/target4/lun0/part1: device 9 in 9 device
active raid5 md1.

sdd1: Update Time : Mon Oct 25 09:19:09 2004
sdx1: Update Time : Mon Oct 25 07:37:42 2004
sds1: Update Time : Mon Oct 25 09:19:09 2004
sdr1: Update Time : Mon Oct 25 09:19:09 2004
sdq1: Update Time : Mon Oct 25 09:19:09 2004
sdp1: Update Time : Mon Oct 25 09:19:09 2004
sdg1: Update Time : Mon Oct 25 09:19:09 2004
sdf1: Update Time : Mon Oct 25 09:19:09 2004

md1 : inactive sdf1[1] sdd1[9] sdx1[7] sds1[6] sdr1[5] sdq1[4] sdp1[3]
sdg1[2]
141763072 blocks
----- s n i p -----

The problem here is that sdd1 is now marked as a spare! The command to get
it this far was:

mdadm -v --assemble md1 --force --run sdf1 sdg1 sdp1 sdq1 sdr1 sds1 sdx1
sdd1

And this will give me the following:
----- s n i p -----
md: md1 stopped.
mdadm: looking for devices for md1
mdadm: sdf1 is identified as a mmd: bind<sdg1>
embermd: bind<sdp1>
md: bind<sdq1>
md: bind<sdr1>
md: bind<sds1>
md: bind<sdx1>
of mdmd: bind<sdd1>
md: bind<sdf1>
raid5: device sdf1 operational as raid disk 1
raid5: device sdx1 operational as raid disk 7
raid5: device sds1 operational as raid disk 6
raid5: device sdr1 operational as raid disk 5
raid5: device sdq1 operational as raid disk 4
raid5: device sdp1 operational as raid disk 3
raid5: device sdg1 operational as raid disk 2
raid5: not enough operational devices for md1 (2/9 failed)
RAID5 conf printout:
--- rd:9 wd:7 fd:2
disk 1, o:1, dev:sdf1
disk 2, o:1, dev:sdg1
disk 3, o:1, dev:sdp1
disk 4, o:1, dev:sdq1
disk 5, o:1, dev:sdr1
disk 6, o:1, dev:sds1
disk 7, o:1, dev:sdx1
raid5: failed to run raid set md1
md: pers->run() failed ...
1, slot 1.
mdadm: sdg1 is identified as a member of md1, slot 2.
mdadm: sdp1 is identified as a member of md1, slot 3.
mdadm: sdq1 is identified as a member of md1, slot 4.
mdadm: sdr1 is identified as a member of md1, slot 5.
mdadm: sds1 is identified as a member of md1, slot 6.
mdadm: sdx1 is identified as a member of md1, slot 7.
mdadm: sdd1 is identified as a member of md1, slot 9.
mdadm: no uptodate device for slot 0 of md1
mdadm: added sdg1 to md1 as 2
mdadm: added sdp1 to md1 as 3
mdadm: added sdq1 to md1 as 4
mdadm: added sdr1 to md1 as 5
mdadm: added sds1 to md1 as 6
mdadm: added sdx1 to md1 as 7
mdadm: no uptodate device for slot 8 of md1
mdadm: added sdd1 to md1 as 9
mdadm: added sdf1 to md1 as 1
mdadm: failed to RUN_ARRAY md1: Invalid argument
----- s n i p -----

I have no idea which disk is supposed to be 0 and/or 8... These are the
disks
used when creating the array!

This message was sent using Swe.Net webmail
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise