Ryugu Sample Database System (RS-DBS) on the Data Archives and Transmission System (DARTS) by the JAXA curation

The JAXA Astromaterials Science Research Group developed a web-based database system for the Hayabusa2-returned samples from C-type asteroid Ryugu. The Ryugu Sample Database System database (RS-DBS) is designed as an online catalog for users of wide scientific communities to choose their preferred samples and propose the sample loan through the JAXA Ryugu Sample Announcement of Opportunity. Ryugu samples can be sorted and given identification numbers as individual particles larger than 1 mm and aggregate samples consisting of less than 1 mm particle through the Phase1 curation (i.e., the initial description). The RS-DBS lists all samples with analytical data such as a microscopy image, size, mass, spectroscopic data, and shape model obtained by the initial description at the JAXA curation facility. The list also includes research results conducted by previous projects (i.e., the Hayabusa2 initial analysis team and Phase2 curation teams). The RS-DBS, built with open-source technologies, archives the data securely and long-term on the Data Archives and Transmission System (DARTS) at ISAS/JAXA.


Introduction
The Japan Aerospace Exploration Agency (JAXA)'s Haya-busa2 spacecraft explored C-type near-Earth asteroid (162173) Ryugu and collected samples of ~ 5.4 g in total at two surface locations (Morota et al. 2020;Tachibana et al. 2022;Yada et al. 2022).The sample was transported to the Extraterrestrial Sample Curation Center (ESCuC), Institute of Space and Astronautical Sciences (ISAS), JAXA (Abe 2021;Yada et al. 2023).The sample was investigated through the initial description (Phase1 curation) in a non-destructive and non-contaminated manner (Yada et al. 2022;Pilorget et al. 2022;Nakato et al. 2022).Collected particles were classified into three categories: (1) individual particles that are longer than 1 mm in the long axis, (2) aggregate samples consisting of particles shorter than 1 mm, and (3) gas samples stored in the gas tanks (Okazaki et al. 2017;Miura et al. 2022;Okazaki et al. 2022a).
The initial description (Yada et al. 2022;Pilorget et al. 2022;Hatakeda et al. 2023;Cho et al. 2022) includes preliminary analyses: (1) mass, (2) stereomicrograph, (3) size of each particle, (4) infrared reflectance spectroscopy, (5) infrared hyperspectral imaging using MicrOmega, (6) six band multi-band imaging, and (7) shape modeling by stereo imaging for each particle.These descriptions were used for the sample allocation to the project-led initial analysis teams and to the Phase2 curation teams for further analyses.Note that 10% of the total mass of Ryugu particles was transferred to NASA Johnson Space Center curation office based on the memorandum of understanding (MOU) between JAXA and NASA.The entirety of the initial description data was given a single data object identification (DOI) as a curatorial dataset (https:// doi.org/ 10. 17597/ ISAS.DARTS/ CUR-Ryugudescr iption, ASRG et al. 2022).The given DOI for dataset meets the FAIR Data principle (Wilkinson et al. 2016).
The JAXA Astromaterials Science Research Group (ASRG) developed a web-based curatorial database system, the Ryugu Sample Database System (RS-DBS; https:// darts.isas.jaxa.jp/ curat ion/ hayab usa2/).The RS-DBS consists of a file server, a database server, and a web server on the Data Archives and Transmission System (DARTS) (Miura et al. 2000).Similar databases of curation purposes were reported to manage a geochemical source dataset obtained by series of analysis (Yachi et al. 2013;Uesugi et al. 2016).The RS-DBS has been and will be used by researchers to select the sample to propose for their analysis through the JAXA Ryugu Sample Announcement of Opportunity (AO).All analytical results from the initial analysis teams, the Phase2 curation teams, and samples allocated to the community through the AO will be updated on the RS-DBS.
The samples allocated to the six sub-teams of the initial analysis team were investigated for a year, from June 2021 to May 2022.Advanced curatorial activities and descriptions by two Phase2 curation teams also started in parallel.Major scientific outcomes have already been published (e.g., Nakamura et al. 2022a;Yokoyama et al. 2022;Ito et al. 2022;Nakamura et al. 2022b;Okazaki et al. 2022a;Okazaki et al. 2022b;Noguchi et al. 2022;Naraoka et al. 2023;Yabuta et al. 2023).After the first-year analytical campaign, all the analyzed and/or processed samples have been returned to ESCuC with some exceptions.Exceptions include consumed samples by destructive analysis and radioactive samples by the Neutron Activation Analysis.Returned samples will be available to the community through an upcoming AO.
In this report, we describe the contents and structure of the RS-DBS in detail.

Contents of the RS-DBS
The RS-DBS (Fig. 1) shows Ryugu particles with unique identification (ID) numbers together with selected analytical results (microscopic image, mass, FT-IR, MicrOmega, multi-band spectroscopy, and stereo imaging as of March 2023).

Sample information
Registered samples in the RS-DBS are classified as follows: (i) particles larger than 1 mm in size, (ii) aggregates consisting of particles smaller than 1 mm and 5-10 mg in dishes, (iii) gas samples extracted from the sample container (Miura et al. 2022;Okazaki et al. 2022a), and (iv) processed samples after analyses returned from research groups such as the Initial Analysis teams, the Phase2 curation teams, and the AO participants.The samples, classified as (iv), are in various forms, such as ultra-thin sections by FIB (focused ion beam) or UMT (ultramicrotome), polished epoxy-mounts, Indium pressed mounts, potted butts epoxy-mounts, IOM (insoluble organic matter) extractions, residues after sub-sample processing (e.g., fragmented particles), and liquid solutions.
The nomenclature of the Ryugu samples involves a prefix indicating the sample catcher used (i.e., A, B, or C) followed by a four-digit number, with numbers increasing in order of naming.Sub-samples derived from the parent sample should be named after the parent name with a hyphen followed by alphanumeric characters and underscores that researchers designated for.Duplicated names are not allowed.Curatorial staff or researchers may assign common names as nicknames for samples, which can be registered in the RS-DBS.The relationships between the parent sample and its sub-samples and the nickname are recorded in the database.
Sample descriptions to be registered are based on the previously established database system for the asteroid Itokawa samples (Uesugi et al. 2016).Descriptions are (1) the sample name which is a unique ID following the nomenclature described above, (2) sample sizes such as the length (mm) and mass (mg) for the particle, volume (mL) and pressure (Pa) in the bottle for the gas sample, and a volume (mL) for the liquid solution sample, (3) histories of analysis (e.g., FT-IR, MicrOmega, and other data from the initial description), and distribution sites (the Initial Analysis teams, Phase2 curation teams, the AOs, and NASA), (4) a current sample container and storage names, (5) a current status to describe sample quality (e.g., kept in the clean chamber or exposed to the atmosphere, unprocessed or processed samples), (6) availability for the AO and current loan status, and (7) published papers related to the sample.Details are shown in Table 1.

Analytical data
The analytical data derived from the initial description (Yada et al. 2022) 4) an infrared hyperspectral image with MicrOmega (Pilorget et al. 2022).This hyperspectral image is 256 × 250 pixel image with 22 µm spatial resolution and 0.99 to 3.65 µm wavelength range; (5) a multi-band imaging with the same filter set (ul: 0.39 µm, b: 0.48 µm, v: 0.55 µm, Na: 0.59 µm, w: 0.70 µm, x: 0.85 µm) as the Hayabusa2 optical navigation telescopic camera (ONC-T; Sugita et al. 2019;Cho et al. 2022); and (6) a 3D shape model by stereo imaging (Cho et al. 2022).Detailed items available on the web are shown in Table 2.All processed data by the initial description are stored in the file server on DARTS.

Structure of the RS-DBS
The RS-DBS consists of three main components: (1) a file server, (2) a Relational Database Management System (RDMS), and (3) a web interface (Fig. 2).We used the Data Archives and Transmission System (DARTS) at ISAS/JAXA as the file server.PostgreSQL and Apache servers with Hypertext preprocessor (PHP) and JavaScript are used for the database system and the web interface, respectively.These well-known open-source technologies are de facto standard, so these are expected to reduce system development and maintenance costs and to be stable in operation.

File server
All the measurement raw data, which is an output file from an instrument without any post-processing, obtained in the initial description at the curation facility is archived and taken a backup in local disc drives.All the worklogs (operators' handwriting note) including the operators' name, measurement date, snapshots of used parameters of the instruments, etc. are also stored in the local hard disk drive as PDF format.Almost all the raw data are processed to improve accessibility, usability, and comprehensibility for users.Available raw data and processed data are stored in a file server on DARTS which manages directories for each sample with the sample name consisting of measurement result directories that store a series of measurement data (Fig. 3).Stored data in DARTS will be archived securely for at least 30 years under the ISAS data security policy.

Relational Database Management System (RDMS)
The RDMS of the RS-DBS manages a sample data table, measurement data tables and a sample loan data Fig. 1 The web interface of the RS-DBS summarizes a set of analytical information on each sample.Users can sort the center table and select samples for a specific item and mass and size range.The table layout can be changed in the "Display style" panel on the top left (e.g., thumbnail style).The "Search constraints" panel on the left provides the users with a sample search function with some keywords.The "Cart" checkbox on the left column in the table allows the users to make a customized sample list.The table in this figure shows the search result of samples that includes FT-IR, MicrOmega, Multi-band images, and Stereo images Table 1 Sample descriptions available on the web.The descriptions contain the name, size or amount, analysis history, storage, condition availability of the Ryugu Sample AO, and scientific reference information related to the sample

Sample name
The sample name should be unique, as X0000.The first capital letter X represents the chamber name where the sample was recovered, followed by the four-digit sequential number 0000.When the sample X0000 is divided into sub-samples, a new sample name X0000-abc is given for each sub-sample.After the hyphenation, no format restriction is assigned Tentative name A temporal/unofficial name or nickname of the sample that requires no specific format.This is given for traceability of the sample handling record because some samples are handled with a temporal name before being given the official sample name

Chamber
The name of the sample chamber of the sample catcher (A, B, or C), where the sample was recovered

Sample form
The form of the sample, such as "individual particle, " "aggregate, " "gas, " and "previously allocated sample." The previously allocated sample indicates that the sample has been analyzed after the initial description and the general term for "polished section, " "FIB section, " and so on

Condition
The current sample condition (pristine or not).The term "pristine" here implies that the sample has not been exposed to air nor experienced any destructive analysis but has only undergone the initial description measurements (e.g., optical microscope, MicrOmega and FT-IR) The height (mm) measured by focusing on the top and bottom of the particle using the optical microscope.This information is given only for individual particles, not for aggregates Pressure (Pa) The sample gas pressure inside the gas container.Available for gas samples, not for solid samples Volume (mL) The sample volume.Available for gas or liquid samples, not for solid samples

Status
The current sample status (on-loan, transfer, lost, or consumed).The term "consumed" means consumption of a sample by a destructive analysis

Measurement history A brief description of analysis records
Distribution history A brief description of distribution records

Container
The ID of the sample dish

Storage
The name of current sample storage

Quality
An identifier of the level and a brief description of the sample's cleanliness.Class-1 or -2 indicates the cleanliness level.Three capital letters in a bracket after the class number describe the environmental conditions that the sample has been exposed to, with three ranks from A-C.For example, "Class-1 (AAA)" describes the best conditions Class-1 or -2 indicates the cleanliness level 1: the sample has been handled with the standard process 2: the sample has been handled with a nonstandard process (e.g., the sample was accidentally touched with a Teflon material.Updating data are generally done twice or three times a month.We developed a user-friendly data registration interface for the RDMS to allow operators to use a spreadsheet form, not SQL commands.

Web interface
The web interface (Fig. 1) is designed to be user-friendly to access the information: (1) users can select a view style of the interface, such as a sample list with images or without images, a thumbnail list, and a detailed data sheet for each sample; (2) users can search samples having specific characteristics such as name, form, measurement history, size range, and mass range.The search function for the compositions of the samples has not been ready yet; it could be achieved after categorizing the sample compositions based on the initial description; (3) users can sort the sample list with specific characteristics.The displayed table can be downloaded as a comma separated value (CSV) formatted file.An export function of the web interface is available to download all sample and measurement descriptions in one file, a JavaScript object notation (JSON) formatted file.

Concluding remarks
We have developed the web-based curatorial database system for the Hayabusa2-returned samples (RS-DBS) as a sample catalog for worldwide users to choose preferable samples and propose the samples for a loan through the JAXA Ryugu Sample AO.The RS-DBS describes the curatorial information, each sample's characteristics, data and analysis history, and sample loan status.Analytical data are securely stored for a long term, no less than 30 years, in DARTS.We have assigned a DOI for the Ryugu sample dataset and kept following the FAIR Data Principle.According to the FAIR data policy, there is room for improvement in applying standardized and searchable metadata to individual analytical data and assigning persistent identifiers (PID), such as the International Generic Sample Number (IGSN; IGSN organization) to individual samples.The JAXA curation is going to improve the web interface to make it more user-friendly, which will be one of the curation activities to maximize the science outputs from the future returned samples, i.e., NASA's Origins, Spectral Interpretation, Resource Identification, Security-Regolith Explorer (OSIRIS-REx) (Lauretta et al. 2019) and JAXA's Martian Moons eXploration (MMX) (Usui et al. 2020) missions.It is expected that improvement of the RS-DBS would be implemented through continuous data registration, from Phase1 and Phase2 curation, the initial analysis, and the JAXA Ryugu Sample AO activities.
are archived in the RS-DBS for each sample.As of March 2023, there are six types of measurements; (1) microscopic images (NIKON SMZ1270i, Miyazaki et al. 2023) with the Feret diameter for each particle; (2) an electric microbalance (Mettler-Toledo XP4042, Miyazaki et al. 2023); (3) an infrared reflectance spectrum by the Fourier transform infrared spectrometer of 1 to 5 µm wavelength range (FT-IR, JASCO VIR-300, Hatakeda et al. 2023); ( ) Three capital letters indicate the environmental conditions with three ranks from A-C The first letter explains the atmospheric condition A: under vacuum or purified nitrogen condition B: other gasses C: atmospheric air on the earth The second letter explains the contact materials of the sample A: quarts glass, sapphire glass, stainless steel, and aluminum B: Teflon C: other materials (such as Viton) The third letter explains the cleaning process of the contact materials of the sample A: a combination process of ultrasonic, degreasing, alkali solvent, and ozone cleaning B: a combination of ultrasonic cleaning and degreasing C: other cleaning methods or no cleaning Description Brief description about the sample Reference Access to the published papers related to this sample Data source Access to directories of the DARTS file server for the entire processed analytical data of the sample Family tree (Parent and child) A description of the relationship among the sample, the sample source (mother sample), and the sub-sample.There are several parent samples when the sample is a mixture of several sample sources CommentA remark for the measurement or dataWeightDate and timeThe date and time when the measurement was performedTotal weight (mg)The sample weight (mg), including the sample dish weight, measured by the electric balance in the clean chamber.This is an average of five weight measurementsSample dish weight (mg)The sample dish weight (mg) measured by the electric balance in the clean chamber.This is an average of five measurements Sample weight (mg)The sample weight (mg), excluding the sample dish weight, measured by the electric balance in the clean chamber.This is an average of five weight measurementsSample weight error (mg)The standard deviation as a measurement error of the sample weight (mg) calculatedDishThe ID of the sample dish Comment A remark for the measurement or data FT-IR Date and time The date and time when the measurement was performed Spectrum image A spectrum chart of the sample measured by the FT-IR system.The chart shows the relative reflectance intensity per wavelength (mm) ROI image Images file of the sample taken by the FT-IR system.This optical image shows the measured region of interest by the laser guide from the FT-IR system ROI light image Images file of the sample taken by the FT-IR system.This optical image shows the measured region of interest by the laser guide and the supplemental light from the FT-IR system CSV file A CSV file of the spectrum image, including the information on measurement conditions Comment A remark for the measurement or data MicrOmega Date and time The date and time when the measurement was performed Position The number of the measurement position.When the target sample is larger than the field of view of MicrOmega, this position number is used to identify the measurement area during measurement Angle (°) The sample's horizontal rotation angle (degree) on the sample stage.This value is relative and used to identify the measurement position easier during measurement Monochromatic image Images file of the sample captured by MicrOmega.
spectroscopy Date and time The date and time when the measurement was performed Measurement condition CSV A CSV file describing the information of measurement conditions IF map image Reflectance (radiance factor or I/F) map at 550 nm (v-band) measured with an incidence angle of 30°, emission angle of 0°, and phase angle of 30°.The photometric effect due to the roughness of the sample surface is not corrected.The solid line outlines the particle rim and the dashed line represents the region used to calculate the particle-averaged spectrum Color ratio image Color map showing the v-to-b band ratio (R550/R480 nm).The solid line marks the particle rim and the dashed line shows the region used to calculate the particle-averaged spectrum Average spectrum image Particle-averaged spectrum taken by the multi-band spectroscopy system in absolute reflectance (Left) and reflectance normalized at 550 nm (Right).The spectrum of each particle (solid black line) is compared with the brightest (orange dashed line), darkest (green dashed line), most red-colored (red dashed line), and most blue-colored (blue dashed line) particles.Error bars show the measurement uncertainty reported by Cho et al. (2022).The photometric effect due to the roughness of the sample surface is not corrected Comment A remark for the measurement or data Stereo imaging Date and Time The date and time when the measurement was performed Elevation map image Microscopic image (Left) and elevation map (Right) of the particle.The elevation map was calculated from a 3D digital elevation model using a structure from motion technique, and the sample was imaged from multiple angles Comment A remark for the measurement or data

Fig. 2 Fig. 3 Fig. 4
Fig.2Components and workflow of the RS-DBS.The RS-DBS consists of a relational database management system (RDMS), a file server, and a web interface/server.The workflow starts from the initial description work in the clean chamber.All analytical data are stored in the fileserver.Further data processing is preformed locally at the curation facility for some data to be registered in the database servers.The data in the file servers are displayed through the web interface, and the users can access the sample catalog having searching functions through the internet The sample weight (mg) is given here if it was remeasured after the initial measurement Size: long length (mm) The maximum Feret diameter, known as the maximum caliper length, of an individual particle (mm) measured by ImageJ (Schneider et al. 2012) on a microscope image captured in the clean chamber.This information is given only for individual particles, not for aggregates Size: short length (mm) The minimum Feret diameter, known as the minimum caliper length, of an individual particle (mm) measured by ImageJ (Schneider et al. 2012) on a microscope image captured in the clean chamber.This information is given only for individual particles, not for aggregates Size: height (mm)

Table 2
Analytical data available on the web.There are six types of measurements derived from the Initial Description(Yada et al. sample file data table, the measurement file data tables, and the sample loan file data table) registered addresses of files stored in the file server to link between each database record and a designated data file.
table (Fig.4), where the ID in the sample data table is a primary key, the function of PostgreSQL, for the relationship among the data tables.Each table has a subordinate table for describing data files.The sample data table indicates essential information required for the curation and sample allocation.The measurement data tables show descriptions of the measurement as described in Table2.The sample loan data table records histories of sample distribution outside the curation facility and manages on-loan samples.Subordinate tables for data files (i.e., the