Set up BeeGFS 8.3 file system
In this section will learn how to install BeeGFS file system on Pouta.
This deployment guide has been tested on AlmaLinux 9.
What is BeeGFS
Developed with a strong focus on performance and designed for ease of use, simple installation, and management, BeeGFS is one of the leading parallel file systems that continues to grow and gain significant popularity in the community. BeeGFS has evolved into a world-wide valued filesystem offering maximum performance, scalability, high flexibility, and robustness.
BeeGFS is available free of charge for end-users. For enterprise systems, professional commercial support is also available, typically in cooperation with our international turn-key solution partners.
Key Benefits
- Distributed File Contents and Metadata
One of the most fundamental concepts of BeeGFS is the strict avoidance of architectural bottlenecks. Striping file contents across multiple storage servers is only one part of this concept. Another important aspect is the distribution of file system metadata (e.g., directory information) across multiple metadata servers. Large systems and metadata intensive applications, in general, can greatly profit from the latter feature.
- HPC Technologies
BeeGFS is built on highly efficient and scalable multithreaded core components with native RDMA support. File system nodes can serve RDMA (InfiniBand, Omni-Path, RoCE) and TCP/IP network connections at the same time and automatically switch to a redundant connection path in case any of them fails.
- Easy to Use
BeeGFS requires no kernel patches (the client is a patchless kernel module, the server components are userspace daemons), comes with graphical cluster installation tools and allows you to add more clients and servers to the running system whenever you want it.
- Optimized for Highly Concurrent Access
Simple remote file systems like NFS do not only have serious performance problems in case of highly concurrent access, they can even corrupt data when multiple clients write to the same shared file, which is a typical use-case for cluster applications. BeeGFS was specifically designed with such use-cases in mind to deliver optimal robustness and performance in situations of high I/O load.
- Client and Server on any Machine
No specific enterprise Linux distribution or other special environment is required to run BeeGFS. BeeGFS client and servers can even run on the same machine to enable performance increases for small clusters or networks. BeeGFS requires no dedicated file system partition on the servers - It uses existing partitions, formatted with any of the standard Linux file systems, e.g., XFS, ext4 or ZFS. For larger networks, it is also possible to create several distinct BeeGFS file system partitions with different configurations.
System Requirements
Check out the official documentation
We will install the metadata on a separate volumes. Like mentioned in the documentation:
As a rule of thumb, 500GB of metadata capacity are sufficient for about 150 million files Note that while ext4 is generally recommended for metadata storage because of its performance advantages for BeeGFS metadata workloads compared to other local Linux file systems, XFS has the advantage of using a dynamic number of inodes, meaning new inodes can be created as long as there is free disk space
Quick Start Guide
The BeeGFS documentation is very rich and instructive on how to install BeeGFS 8.3.
In this tutorial, we use the following:
-
Software: AlmaLinux 9
-
Host Services:
beegfs-meta: Metadata server with a 500 Gb volume attachedbeegfs-mgmt: Management serverbeegfs-storage-1: Storage server 1 with a 1000 Gb volume attachedbeegfs-storage-2: Storage server 2 with two 1000 Gb volumes attachedbeegfs-storage-3: Storage server 3 with a 1000 Gb volume attachedbeegfs-client-1: Client 1beegfs-client-2: Client 2
Package download and installation
You can find the different packages from this address: https://www.beegfs.io/release/
For AlmaLinux 9, run these commands with root user on every nodes:
yum install -y epel-release wget vim
yum update -y
rpm --import https://www.beegfs.io/release/beegfs_8.3/gpg/GPG-KEY-beegfs
wget https://www.beegfs.io/release/beegfs_8.3/dists/beegfs-rhel9.repo -O /etc/yum.repos.d/beegfs.repo
Then, depending on the node, run this command with root user:
beegfs-mgmtnode:
beegfs-metanode:
beegfs-storagenodes:
beegfs-clientnodes:
Configuration
Management service
To initialize the database for a new BeeGFS installation run on the beegfs-meta node with root user:
Edit the file /etc/beegfs/beegfs-mgmtd.toml
By default, the management service requires configuring TLS. You can disable it by setting tls-disable = true
We recommend to change the log-level to debug log-level = debug
Save and exit. Restart the service:
Metadata service
Earlier, we presented the different Host Services. In this tutorial, we consider the server beegfs-meta with a 500 Gb volume attached.
First, we need to prepare the volume. We have a documentation on how to use an attached volume.
Here are the detailed instructions for our case:
- List the volumes:
- Assuming that the attached volume is
/dev/vdb, format it:
- Create the folder
/mnt/beegfs_meta:
- Mount the volume:
- Edit
/etc/fstaband add this entry:
Once the volume is attached and mounted, we can proceed with the initialisation of the metadata service (with the root user).
The metadata service needs to know where it can store its data and where the management service is running. Typically, you will have multiple metadata services running on different machines. In our case, we only use one.
Optionally, you can also define a custom numeric metadata service ID (range 1..65535). We will pick number 2 as metadata service ID here.
Replace IP_ADDRESS_MGMT_NODE by the internal IP of your beegfs-mgmt server
Last thing to check, edit the file /etc/beegfs/beegfs-meta.conf and verify that storeClientXAttrs=true
Storage service
Earlier, we presented the different Host Services. In this tutorial, we consider:
-
beegfs-storage-1with a 1000 Gb volume attached. -
beegfs-storage-2with two 1000 Gb volumes attached. -
beegfs-storage-3with a 1000 Gb volume attached.
The instructions to prepare the volumes are the same as previously for the Metadata service. You can also check our documentation
The storage service needs to know where it can store its data and how to reach the management server.
Typically, you will have multiple storage services running on different machines and/or multiple storage targets (e.g., multiple RAID volumes) per storage service.
Optionally, you can also define a custom numeric storage service ID and numeric storage target ID (both in range 1..65535). As this service is running on a server with name beegfs-client-1 in our example, we will pick number 1 as ID for this storage service and we will use 101 as storage target ID to show that this is the first target (01) of storage service 1.
With the root user, run:
Replace IP_ADDRESS_MGMT_NODE by the internal IP of your beegfs-mgmt server
The node beegfs-storage-2 has two 1000 Gb volumes attached. You can use this command to add the two volumes in the same service ID:
/opt/beegfs/sbin/beegfs-setup-storage -p /mnt/beegfs_storage-1 -s 2 -i 201 -m IP_ADDRESS_MGMT_NODE
/opt/beegfs/sbin/beegfs-setup-storage -p /mnt/beegfs_storage-2 -s 2 -i 202
Client
The clients need to know where the management service is running.
With the root user, run:
Replace IP_ADDRESS_MGMT_NODE by the internal IP of your beegfs-mgmt server
The client mount directory is defined in a separate configuration file. This file will be used by the beegfs-client service startup script. By default, BeeGFS will be mounted to /mnt/beegfs. Thus, you need to perform this step only if you want to mount the file system to a different location.
The first entry defines the mount directory. The second entry refers to the corresponding configuration file for this mount point.
Last thing to check, edit the file /etc/beegfs/beegfs-client.conf and verify that sysSELinuxEnabled=true
Authentication
It is highly recommended to secure your BeeGFS installation by enabling connection-based authentication using a shared secret.
For our case, we will distribute a shared secret to all nodes and setting the connAuthFile parameter for each service/client configuration. By default, services and clients expect the secret at /etc/beegfs/conn.auth
Authentication disable - NOT RECOMMENDED
You can disable the authentication by setting connDisableAuthentication=true in all service and configuration files (located in /etc/beegfs/*.conf) and setting auth-disable=true in the Management Service configuration (etc/beegfs/beegfs-mgmtd.toml). The services need to be restarted for the changes to take effect.
- Create a file which contains a shared secret
- Ensure the file is only readable by the root user:
Copy the file to all hosts in the cluster (mgmtd, meta, storage, client).
If this is a new BeeGFS 8 installation, the default configuration files for all services and clients will already have the authentication file path set to /etc/beegfs/conn.auth.
If you are enabling connection authentication for the first time, in your management configuration file set auth-file = "/etc/beegfs/conn.auth" and for all other clients and servers set connAuthFile=/etc/beegfs/conn.auth.
Restart the services so the change takes effect:
systemctl restart beegfs-mgmtd.service
systemctl restart beegfs-meta.service
systemctl restart beegfs-storage.service
systemctl restart beegfs-client.service
systemctl restart beegfs-mon.service
After all services have correctly restarted no client without the shared secret file can connect the cluster. Without the shared secret file, the management daemon will log the following when log-level = "debug":
You can change the secret on an already configured/secured cluster. Just replace the file conn.auth on all nodes with a new one.
Service startup
BeeGFS services can be started in arbitrary order by using the corresponding systemctl service scripts. By default all services log to the system journal (use the -u
With the root user, run:
systemctl start beegfs-mgmtd
systemctl start beegfs-meta
systemctl start beegfs-storage # repeat this operation for all storage nodes
systemctl start beegfs-client # repeat this operation for all client nodes
Note
BeeGFS clients have a mount sanity check and cancel a mount operation if servers are unreachable. If you want to mount even with unreachable servers, set sysMountSanityCheckMS=0 in the file /etc/beegfs/beegfs-client.conf.
Congratulations, your parallel file system is now up and running!
Check connectivity
Setup your client node to interact with BeeGFS using the new beegfs tool.
The new tool does not use a configuration file, but rather uses flags and/or environment variables for configuration. This means if you want persistent configuration for the tool it can be set in your ~/.bashrc file (or equivalent for your shell).
If you followed the recommendations in the quick start guide no additional configuration is required, and the tool will work out of the box. Below is common configuration to be aware of in case you deviated from the guide.
If you choose to use a self-signed TLS certificate and enable connection authentication, as long as the TLS certificate and secret are already at /etc/beegfs/cert.pem and /etc/beegfs/conn.auth on the machine where you are running the tool, then no configuration is needed. If you choose to disable TLS and/or connection authentication you will also need to specify those options on the client nodes:
Once you have the correct configuration in place, on the client nodes, check the detected network interfaces and transport protocols with the following commands (with the root user):
beegfs node list --with-nics
beegfs health net # Display connections the client is actually using.
beegfs health df # Display free space and inodes on metadata and storage targets.
beegfs health check # Check for common issues.
If you intend to use the BeeGFS community edition you will have seen warnings when running commands that the system is unlicensed. Run beegfs license and follow the steps to obtain a license.
Licensing
Obtaining licenses
To obtain a community license upgrade to BeeGFS 8.3+ and run beegfs license. This will generate a system specific URL that automatically provides limited information about the deployment including the storage capacity, number of metadata and storage services, and the network protocol.
Configuring licensing
These steps can be followed to enable licensing on a new or existing BeeGFS install:
-
Install the
libbeegfs-licensepackage on all machines that may run the BeeGFS management service. If the BeeGFS management service can run on multiple machines, this package should be installed everywhere the BeeGFS management service can run. Starting with BeeGFS 8.3 this package will automatically be installed on all machines where the management package is installed.Note
without the management service running on the same machine, installing this package has no effect, so it is safe to install anywhere it might be needed.
-
On all machines where the BeeGFS management service can run, download your license file to
/etc/beegfs/license.pem. -
Perform one of the following options to reload the license:
-
If this is a new install simply start the management service. The following will be logged if the license verification library and license are installed correctly:
-
If this is an existing deployment, from a machine with the
beegfs-toolspackage installed runbeegfs license --reloadto install a new license without requiring any downtime. The license details will be returned on success.
-
Optionally verify the details of the installed license with beegfs license.
Renewing or updating a license
If the license ever needs to be updated, for example when extending support contract duration or adding additional servers, simply place the new license file at the path configured in the management service and run the beegfs license --reload command. The new license will be applied immediately and without the need for a restart of the management service.
About the license
- You can freely use BeeGFS Community Edition for your projects without paying.
- There are no fees or mandatory constraints on usage for either personal or commercial use.
- Redistribution and modification have some restrictions under the Community License.
- For enterprise needs, support, or advanced features, you’d consider the paid Enterprise Edition.
The BeeGFS Community Edition is governed by the BeeGFS Community License, which you accept by using the software. If you do not agree to the license terms, you should not install, copy, or use BeeGFS.
Key points from the license include:
- The license allows you to use the software freely, but it comes with terms you must comply with.
- Redistribution and modification are restricted; you cannot redistribute modified versions without explicit permission.
- The software is provided "as-is," meaning there is no warranty or guarantee for its performance or suitability.
- The license is designed to protect the intellectual property of the BeeGFS creators while allowing free use for both personal and commercial purposes.
Migrate the metadata nodes
If you have deployed your metadata on the local disk, you won't be able to extend the volume.
You can easily migrate the metadata for / to a mounted volume. The best way to accomplish this is to run rsync
Once your volume has been formatted and mounted, you can run these commands (with root user):
systemctl stop beegfs-meta
rsync -aHAX --numeric-ids /data/beegfs/beegfs_meta/ /mnt/beegfs_meta/
systemctl start beegfs-meta
Important
DO NOT re-register the metadata service with /opt/beegfs/sbin/beegfs-setup-meta -p /data/beegfs/beegfs_meta -s 2 -m IP_ADDRESS_MGMT_NODE The management service will be confused because it will generate a new registration token and won't be able to start the service
Benchmark
BeeGFS comes with a tool that allows you to benchmark the storage
The commands below are run the root user.
The following example starts a write benchmark on all targets of all BeeGFS storage servers with an IO blocksize of 512 KB, using 10 threads (i.e., simulated client streams) per target, each of which will write 200 GB of data to its own file.
To query the benchmark status/result of all targets, execute the command below.
You can use the watch flag for repeating the query in a given interval in seconds, as shown below:
The generated files will not be automatically deleted when a benchmark is complete. You can delete them by using the following command.
More details about the storage benchmark and its options are available in the help of the beegfs tool, as follows.