Using Allas to host a data set for a research project
An example scenario of an Allas use case.
Roles of the play
Saara: A professor coordinating an inspiring research project.
Pekka: A researcher that takes care of the data management of the project.
Mats: A technician working at Analysis Service Center.
Xi and Laura: Researchers working in the research project.
Act 1. Professor Saara opens CSC projects
Professor Saara is running a large research project called HiaNo in a Finnish university. The project has just sent a set of samples to Analysis Service Center to be processed and analyzed. The analysis takes some weeks and produces 80 TB of data that the research group will use in the actual research.
Saara and Pekka, who is taking care of the data management, study the storage options provided by CSC. They decide to use the Allas service for storing and sharing the data during the research project. The data is not sensitive personal data, so Allas is suitable.
Then Saara creates two research projects at CSC: one called Data management of the HiaNo project (project ID: project_2000444) and another called HiaNo research project (project ID: project_2000333).
Once the CSC projects are established, Saara activates the Allas, Puhti and cPouta services for both projects. As Saara knows that the default storage space of Allas (10 TB) will not be enough for the incoming data set, she sends a request for 90 TB of Allas quota for the project Data management of the HiaNo project to email@example.com.
Finally, Saara adds Pekka to both CSC projects and asks him to take care of the details of the incoming data.
Act 2. Creating a shared bucket
Mats from Analysis Service Center contacts Pekka and tells that the results are available, and asks how he should deliver the data. Mats has an account at CSC (msundber in the project project_2000111) with Allas enabled, so Pekka proposes that data be uploaded to Allas. For that purpose, Pekka creates a bucket in Allas and allows Mats to use it.
Pekka logs in to Puhti
and opens a connection to the data management project in Allas:
module load allas allas-conf project_2000444
Then he creates a new bucket in Allas. There are many ways to do this but this time, Pekka does this by importing a new file to Allas with a-put:
echo “This bucket is used to host the original data of HiaNo project sample1” > README.txt a-put --nc -b hiano-project-sample001 README.txt a-list hiano-project-sample001
Pekka included the project name in the bucket name (hiano-project-sample001) to make sure that the bucket name is unique in the whole Allas service. The a-list command shows that the bucket was successfully created.
Next Pekka uses the swift post command to modify the access rights of the new bucket so that Mats (user msundber from Allas project_2000111) is able so use it.
swift post hiano-project-sample001 -r "project_2000444:*,project_2000111:msundber" swift post hiano-project-sample001 -w "project_2000444:*,project_2000111:msundber" swift stat hiano-project-sample001
In Allas, large files (over 5 GB) are split during the upload and stored as several objects in a bucket, which is normally automatically created. This bucket's name is has the extension
_segments. In this example, the name would be hiano-project-sample001_segments. Normally, users should not directly interact with the segments buckets, but this case is an exception. Pekka will now manually create the segments bucket as well, to ensure that it is created (and thus owned) by the same project and to be able set access rights for this bucket.
a-put --nc -b hiano-project-sample001_segments README.txt a-list hiano-project-sample001_segments swift post hiano-project-sample001_segments -r "project_2000444:*,project_2000111:msundber" swift post hiano-project-sample001_segments -w "project_2000444:*,project_2000111:msundber" swift stat hiano-project-sample001_segments
Now Pekka has prepared a bucket (and the corresponding segments bucket) into which Mats can import the data. Pekka still needs to send the name of the bucket to Mats, as normal Allas listing commands do not display the name for Mats who is not a member in the project that owns the bucket.
Act 3. Uploading data
Mats has Allas tools installed in the front end server of the measurement device at Analysis Service Center. Thus he can upload the data directly from the front end server to the hiano-project-sample1 bucket in Allas:
rclone copy sample1/cannel43/aa_3278830.dat allas:hiano-project-sample001/sample1/cannel43/aa_3278830.dat
As there is a large amount of data to be transported, the upload takes few days and needs to be done in several batches. When Mats tells that he is ready with the data uploads, Pekka closes the shared bucket:
swift post hiano-project-sample001 -r "" swift post hiano-project-sample001_segments -r "" swift post hiano-project-sample001 -w "" swift post hiano-project-sample001_segments -w "" swift stat hiano-project-sample001
Act 4. Using the data in research
Once the data is available, the actual analysis work begins. There will be several users using the data set during the research project. Pekka knows that if all users use the data with full access rights (read and write), there is a danger that somebody accidentally deletes or overwrites some part of the data. Thus, it is agreed that while the data is hosted by the data management project (project_2000444), the researchers access the data through the HiaNo research project (project_2000333).
Pekka gives read access to the hiano-project-sample001 bucket for the project project_2000333 but no write access.
module load allas allas-conf project_2000444 swift post hiano-project-sample001 -r "project_2000333:*,project_2000444:*" swift post hiano-project-sample001_segments -r "project_2000333:*,project_2000444:*"
Xi and Laura can now start working with the data. They register using the MyCSC portal, after which Saara, who is the Principal Investigator, adds them to the CSC project HiaNo research project (project_2000333).
Xi and Laura need to revisit MyCSC and accept the services of the research project. After that, they can download the research data they need to any environment that is able to connect to Allas: Puhti, a virtual machine in cPouta, or their own laptop. As new researchers join the project, Saara adds them in project_2000333, so that they can access the data.
Because storing data in Allas consumes billing units, Saara needs to check the saldo in MyCSC from time to time, and if needed, apply for more billing units (80 TB consumes 700 800 Bu in year). Fortunately, HiaNo is an academic research project, so Saara does not need to pay for the billing units.
Act 5. The end
After four years of intensive research that has expanded to several institutes in Finland and abroad, the HiaNo project has produced a few theses and many high quality publications (all acknowledging the use of CSC resources).
The data is no longer actively used presently. A part of the data that was imported to Allas has been published in international research databases. Some datasets have been moved to IDA, so that a DOI identifier and metadata can be linked to the data to make it reusable by other researchers. Some data can now be deleted and some remaining parts be moved to the buckets of the new HiaNo2 project.
At this stage, Pekka cleans the remaining data objects from Allas, after which Saara informs CSC that the project can be closed.