diff mbox series

[bug#64259,2/2] Provide md-array-device-mapping to start MD arrays via UUID or name.

Message ID 4e7eab10caeacfb1f8a0736cdab7154c517b9e36.1687571974.git.felix.lechner@lease-up.com
State New
Headers show
Series Allow booting of degraded software RAID/MD arrays | expand

Commit Message

Felix Lechner June 24, 2023, 2:07 a.m. UTC
This commit cures the most precipitous danger for users of MD arrays in GNU
Guix, namely that their equipment may not boot after a drive failure. That
behavior likely contradicts their primary expectation for having such a disk
arrangments.

In order to facilitate a smooth transition from raid-device-mapping to
md-array-device-mapping, this commit introduces a new mapping rather than
repurpose the old one. The new mapping here is also incompatible with
raid-device-mapping in the sense that a plain string is now interpreted as the
array name from the MD superblock.

For details, please consult the mdadm manual page.

Personally, the author prefers UUIDs over array names when identifying array
components, but either will work. The system test uses the name.

The name for the new device mapping was chosen instead of the traditional RAID
to account for the fact that some modern technologies (like SSDs) and some
array configurations, such as striping, are neither redundant nor inexpensive.

Adjusts the documentation by erasing any mention of the obsolete
raid-device-mapping. No one should use that any longer. Ideally, users would
be a deprecation warning, but I was unable to adapt 'define-deprecated' to
this use case. Please feel free to make further changes.

This commit includes an updated system test for the root file system on
an-md-array.

More details for the motivation of these changes may be available here:

  https://lists.gnu.org/archive/html/guix-devel/2023-04/msg00010.html

The author of this commit used to maintain mdadm in Debian.

Please feel free to insert better changelog messages. I had some difficulty
meeting the likely expectations of any reviewer. Please also feel free to make
any other adjustments as needed without checking with me. Thanks!

* gnu/system/mapped-devices.scm: New variable md-array-device-mapping.
* doc/guix.texi: Mention md-array-device-mapping in the documentation..
* gnu/tests/install.scm: Adjust test for root-on-md-array.
---
 doc/guix.texi                 | 28 ++++++++++++++------------
 gnu/system/mapped-devices.scm | 38 ++++++++++++++++++++++++++++++++++-
 gnu/tests/install.scm         | 32 ++++++++++++++---------------
 3 files changed, 68 insertions(+), 30 deletions(-)

Comments

Ludovic Courtès Oct. 20, 2023, 9:55 p.m. UTC | #1
Hi,

Felix Lechner <felix.lechner@lease-up.com> skribis:

> This commit cures the most precipitous danger for users of MD arrays in GNU
> Guix, namely that their equipment may not boot after a drive failure.

Why would that happen?  Could be because the device names specified in
the ‘source’ field of the mapped device become invalid?

> Adjusts the documentation by erasing any mention of the obsolete
> raid-device-mapping. No one should use that any longer. Ideally, users would
> be a deprecation warning, but I was unable to adapt 'define-deprecated' to
> this use case. Please feel free to make further changes.

If it has to be deprecated then yes, we try and use ‘define-deprecated’.

> Please feel free to insert better changelog messages. I had some difficulty
> meeting the likely expectations of any reviewer. Please also feel free to make
> any other adjustments as needed without checking with me. Thanks!

The reviewer may feel free, sure…  :-)

>  @item source
> -This is either a string specifying the name of the block device to be mapped,
> -such as @code{"/dev/sda3"}, or a list of such strings when several devices
> -need to be assembled for creating a new one.  In case of LVM this is a
> -string specifying name of the volume group to be mapped.
> +This is either a string specifying the name of the block device to be
> +mapped, such as @code{"/dev/sda3"}.  For MD array devices it is either
> +the UUID of the array or a string that is interpreted as the array name
> +(see Mdadm documentation).  In case of LVM it is a string specifying
> +name of the volume group to be mapped.

Instead of “see Mdadm documentation”, could you add a link or a command
to type to access said documentation?  Better yet, an example of what an
mdadm device name or UUID is and how to obtain it would be welcome.

> +(define (open-md-array-device source targets)
> +  "Return a gexp that assembles SOURCE to the MD device
> +TARGET (e.g., \"/dev/md0\"), using 'mdadm'."
> +  (let ((array-selector
> +         (match source
> +           ((? uuid?)
> +            (string-append "--uuid=" (uuid->string source)))
> +           ((? string?)
> +            (string-append "--name=" source))))
> +        (md-device
> +         (match targets
> +           ((target)
> +            target))))
> +    (if (and array-selector md-device)
           ^
This condition is always true.

> +        ;; Use 'mdadm-static' rather than 'mdadm' to avoid pulling its whole
> +        ;; closure (80 MiB) in the initrd when an MD device is needed for boot.
> +        #~(zero? (system* #$(file-append mdadm-static "/sbin/mdadm")
> +                          "--assemble" #$md-device
> +                          "--run"
> +                          #$array-selector))
> +        #f)))
> +
> +(define (close-md-array-device source targets)
> +  "Return a gexp that stops the MD device TARGET."
> +  (match targets
> +    ((target)
> +     #~(zero? (system* #$(file-append mdadm-static "/sbin/mdadm")
> +                       "--stop" #$target)))))
> +
> +(define md-array-device-mapping
> +  ;; The type of MD mapped device.
> +  (mapped-device-kind
> +   (open open-md-array-device)
> +   (close close-md-array-device)))

Instead of renaming and duplicating part of the logic, how about
supporting those new ‘source’ specification right in ‘open-raid-device’?
It would emit a deprecation warning when ‘source’ is a list of strings.

Does the busy wait loop currently in ‘open-raid-device’ need to be
preserved?

Thanks,
Ludo’.
Felix Lechner Oct. 22, 2023, 5:44 p.m. UTC | #2
Hi,

Thanks for looking at my patch!

On Fri, Oct 20 2023, Ludovic Courtès wrote:

> Could ... the device names specified in
> the ‘source’ field of the mapped device become invalid?

In RAID failures, the devices for defective components usually stop
functioning as intended or become unavailable altogether. Listing those
devices on the mdadm command line, however, requires them to be present
for the assembly of the array. For the fault-tolerant behavior people
expect, arrays should be started via the array name or the special UUID.

> try and use ‘define-deprecated’.

Yes, thank you! I will do so and deploy locally before I update the
patch herein.

> Instead of “see Mdadm documentation”, could you add a link or a command
> to type to access said documentation?

Upon review, I am not sure that the mdadm documentation is actually very
helpful. I must have been thinking about third-party sites.

> Better yet, an example of what an
> mdadm device name or UUID is and how to obtain it would be welcome.

Yes, I will include examples on how to access both, and how to change
the "array name." The latter can be a chosen string that is optionally
prefaced by the local host name. The array name is not the same as the
device name, which looks like /dev/md12. I shall clarify all that in the
revised patch.

>> +    (if (and array-selector md-device)
>            ^
> This condition is always true.

Okay, I may not know Guile macros well enough.

> Instead of renaming and duplicating part of the logic, how about
> supporting those new ‘source’ specification right in ‘open-raid-device’?
> It would emit a deprecation warning when ‘source’ is a list of strings.

It's an good idea, but many other file systems offer RAID-type
functionality. Do you think that a raid-device-mapping based on mdadm
occupies a fair share in the common name space?

> Does the busy wait loop currently in ‘open-raid-device’ need to be
> preserved?

I personally do not believe so but I'll defer to Andreas Enge, whom I
copied on this message. I believe Andreas wrote the original device
mapping.

Kind regards
Felix
Andreas Enge Jan. 18, 2024, 2:39 p.m. UTC | #3
Hello,

Am Sun, Oct 22, 2023 at 10:44:13AM -0700 schrieb Felix Lechner:
> > Does the busy wait loop currently in ‘open-raid-device’ need to be
> > preserved?
> I personally do not believe so but I'll defer to Andreas Enge, whom I
> copied on this message. I believe Andreas wrote the original device
> mapping.

well, I do not know whether it is still needed. It appears that when
I wrote the code for bayfront, we needed to wait a bit until the hard
disks appeared. Are there reasons to believe that this has changed
in the meantime?

Andreas
Felix Lechner Jan. 18, 2024, 4:46 p.m. UTC | #4
Hi Andreas,

On Thu, Jan 18 2024, Andreas Enge wrote:

> when I wrote the code for bayfront, we needed to wait a bit until the
> hard disks appeared.

How long ago? I never needed it on six pieces of equipment of varying
dimensions, including an SAS server (with reflashed Dell equipment, if
that's what Bayfront is using) and a VM with NVMe SSDs.

Either way, accepting this patch as is now will not break anything. We
introduce a new mapping (md-device-mapping) that can be used to
reconfigure Bayfront at the maintainer's leisure.

It would be easy to react to a bug report later while Bayfront continues
to use raid-device-mapping..

Kind regards
Felix
Andreas Enge Jan. 18, 2024, 4:51 p.m. UTC | #5
Am Thu, Jan 18, 2024 at 08:46:24AM -0800 schrieb Felix Lechner:
> Either way, accepting this patch as is now will not break anything. We
> introduce a new mapping (md-device-mapping) that can be used to
> reconfigure Bayfront at the maintainer's leisure.
> It would be easy to react to a bug report later while Bayfront continues
> to use raid-device-mapping..

Okay, that is fine with me!

Andreas
diff mbox series

Patch

diff --git a/doc/guix.texi b/doc/guix.texi
index c961f706ec..91125479b1 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -17513,18 +17513,19 @@  the system boots up.
 
 @table @code
 @item source
-This is either a string specifying the name of the block device to be mapped,
-such as @code{"/dev/sda3"}, or a list of such strings when several devices
-need to be assembled for creating a new one.  In case of LVM this is a
-string specifying name of the volume group to be mapped.
+This is either a string specifying the name of the block device to be
+mapped, such as @code{"/dev/sda3"}.  For MD array devices it is either
+the UUID of the array or a string that is interpreted as the array name
+(see Mdadm documentation).  In case of LVM it is a string specifying
+name of the volume group to be mapped.
 
 @item target
 This string specifies the name of the resulting mapped device.  For
 kernel mappers such as encrypted devices of type @code{luks-device-mapping},
 specifying @code{"my-partition"} leads to the creation of
 the @code{"/dev/mapper/my-partition"} device.
-For RAID devices of type @code{raid-device-mapping}, the full device name
-such as @code{"/dev/md0"} needs to be given.
+For MD array devices of type @code{md-array-device-mapping}, the full device
+name such as @code{"/dev/md18"} needs to be given.
 LVM logical volumes of type @code{lvm-device-mapping} need to
 be specified as @code{"VGNAME-LVNAME"}.
 
@@ -17544,11 +17545,12 @@  command from the package with the same name.  It relies on the
 @code{dm-crypt} Linux kernel module.
 @end defvar
 
-@defvar raid-device-mapping
+@defvar md-array-device-mapping
 This defines a RAID device, which is assembled using the @code{mdadm}
-command from the package with the same name.  It requires a Linux kernel
-module for the appropriate RAID level to be loaded, such as @code{raid456}
-for RAID-4, RAID-5 or RAID-6, or @code{raid10} for RAID-10.
+command from the package with the same name.  It requires the Linux kernel
+module for the appropriate RAID level to be loaded, such as @code{raid1}
+for mirroring, @code{raid456} for the checksum-based RAID levels 4, 5 or 6,
+or @code{raid10} for RAID-10.
 @end defvar
 
 @cindex LVM, logical volume manager
@@ -17606,9 +17608,9 @@  may be declared as follows:
 
 @lisp
 (mapped-device
-  (source (list "/dev/sda1" "/dev/sdb1"))
-  (target "/dev/md0")
-  (type raid-device-mapping))
+  (source (uuid "33cf3e31:8e33d75b:517d64b9:0a8f7623" 'mdadm))
+  (target "/dev/md17")
+  (type md-array-device-mapping))
 @end lisp
 
 The @file{/dev/md0} device can then be used as the @code{device} of a
diff --git a/gnu/system/mapped-devices.scm b/gnu/system/mapped-devices.scm
index e6b8970c12..ffe5bc00f4 100644
--- a/gnu/system/mapped-devices.scm
+++ b/gnu/system/mapped-devices.scm
@@ -64,6 +64,7 @@  (define-module (gnu system mapped-devices)
             check-device-initrd-modules           ;XXX: needs a better place
 
             luks-device-mapping
+            md-array-device-mapping
             raid-device-mapping
             lvm-device-mapping))
 
@@ -317,11 +318,46 @@  (define raid-device-mapping
    (open open-raid-device)
    (close close-raid-device)))
 
+(define (open-md-array-device source targets)
+  "Return a gexp that assembles SOURCE to the MD device
+TARGET (e.g., \"/dev/md0\"), using 'mdadm'."
+  (let ((array-selector
+         (match source
+           ((? uuid?)
+            (string-append "--uuid=" (uuid->string source)))
+           ((? string?)
+            (string-append "--name=" source))))
+        (md-device
+         (match targets
+           ((target)
+            target))))
+    (if (and array-selector md-device)
+        ;; Use 'mdadm-static' rather than 'mdadm' to avoid pulling its whole
+        ;; closure (80 MiB) in the initrd when an MD device is needed for boot.
+        #~(zero? (system* #$(file-append mdadm-static "/sbin/mdadm")
+                          "--assemble" #$md-device
+                          "--run"
+                          #$array-selector))
+        #f)))
+
+(define (close-md-array-device source targets)
+  "Return a gexp that stops the MD device TARGET."
+  (match targets
+    ((target)
+     #~(zero? (system* #$(file-append mdadm-static "/sbin/mdadm")
+                       "--stop" #$target)))))
+
+(define md-array-device-mapping
+  ;; The type of MD mapped device.
+  (mapped-device-kind
+   (open open-md-array-device)
+   (close close-md-array-device)))
+
 (define (open-lvm-device source targets)
   #~(and
      (zero? (system* #$(file-append lvm2-static "/sbin/lvm")
                      "vgchange" "--activate" "ay" #$source))
-     ; /dev/mapper nodes are usually created by udev, but udev may be unavailable at the time we run this. So we create them here.
+                                        ; /dev/mapper nodes are usually created by udev, but udev may be unavailable at the time we run this. So we create them here.
      (zero? (system* #$(file-append lvm2-static "/sbin/lvm")
                      "vgscan" "--mknodes"))
      (every file-exists? (map (lambda (file) (string-append "/dev/mapper/" file))
diff --git a/gnu/tests/install.scm b/gnu/tests/install.scm
index 0f4204d1a6..061365fd87 100644
--- a/gnu/tests/install.scm
+++ b/gnu/tests/install.scm
@@ -64,7 +64,7 @@  (define-module (gnu tests install)
             %test-iso-image-installer
             %test-separate-store-os
             %test-separate-home-os
-            %test-raid-root-os
+            %test-md-array-root-os
             %test-encrypted-root-os
             %test-encrypted-home-os
             %test-encrypted-root-not-boot-os
@@ -612,11 +612,11 @@  (define %test-separate-store-os
 
 
 ;;;
-;;; RAID root device.
+;;; MD root device.
 ;;;
 
-(define-os-with-source (%raid-root-os %raid-root-os-source)
-  ;; An OS whose root partition is a RAID partition.
+(define-os-with-source (%md-array-root-os %md-array-root-os-source)
+  ;; An OS whose root partition is a MD partition.
   (use-modules (gnu) (gnu tests))
 
   (operating-system
@@ -633,9 +633,9 @@  (define-os-with-source (%raid-root-os %raid-root-os-source)
     (initrd-modules (cons "raid1" %base-initrd-modules))
 
     (mapped-devices (list (mapped-device
-                           (source (list "/dev/vda2" "/dev/vda3"))
+                           (source "marionette:mirror0")
                            (target "/dev/md0")
-                           (type raid-device-mapping))))
+                           (type md-array-device-mapping))))
     (file-systems (cons (file-system
                           (device (file-system-label "root-fs"))
                           (mount-point "/")
@@ -649,7 +649,7 @@  (define-os-with-source (%raid-root-os %raid-root-os-source)
                                                   (guix combinators)))))
                     %base-services))))
 
-(define %raid-root-installation-script
+(define %md-array-root-installation-script
   ;; Installation with a separate /gnu partition.  See
   ;; <https://raid.wiki.kernel.org/index.php/RAID_setup> for more on RAID and
   ;; mdadm.
@@ -665,8 +665,8 @@  (define %raid-root-installation-script
   mkpart primary ext2 1.6G 3.2G \\
   set 1 boot on \\
   set 1 bios_grub on
-yes | mdadm --create /dev/md0 --verbose --level=mirror --raid-devices=2 \\
-  /dev/vdb2 /dev/vdb3
+yes | mdadm --create /dev/md0 --verbose --homehost=marionette --name=mirror0 \\
+  --level=mirror --raid-devices=2 /dev/vdb2 /dev/vdb3
 mkfs.ext4 -L root-fs /dev/md0
 mount /dev/md0 /mnt
 df -h /mnt
@@ -677,21 +677,21 @@  (define %raid-root-installation-script
 sync
 reboot\n")
 
-(define %test-raid-root-os
+(define %test-md-array-root-os
   (system-test
-   (name "raid-root-os")
+   (name "md-array-root-os")
    (description
     "Test functionality of an OS installed with a RAID root partition managed
 by 'mdadm'.")
    (value
-    (mlet* %store-monad ((images (run-install %raid-root-os
-                                              %raid-root-os-source
+    (mlet* %store-monad ((images (run-install %md-array-root-os
+                                              %md-array-root-os-source
                                               #:script
-                                              %raid-root-installation-script
+                                              %md-array-root-installation-script
                                               #:target-size (* 3200 MiB)))
                          (command (qemu-command* images)))
-      (run-basic-test %raid-root-os
-                      `(,@command) "raid-root-os")))))
+      (run-basic-test %md-array-root-os
+                      `(,@command) "md-array-root-os")))))
 
 
 ;;;