Experiments using Semantic Web technologies to connect IUGONET, ESPAS and GFZ ISDC data portals

E-science on the Web plays an important role and offers the most advanced technology for the integration of data systems. It also makes available data for the research of more and more complex aspects of the system earth and beyond. The great number of e-science projects founded by the European Union (EU), university-driven Japanese efforts in the field of data services and institutional anchored developments for the enhancement of a sustainable data management in Germany are proof of the relevance and acceptance of e-science or cyberspace-based applications as a significant tool for successful scientific work. The collaboration activities related to near-earth space science data systems and first results in the field of information science between the EU-funded project ESPAS, the Japanese IUGONET project and the GFZ ISDC-based research and development activities are the focus of this paper. The main objective of the collaboration is the use of a Semantic Web approach for the mashup of the project related and so far inoperable data systems. Both the development and use of mapped and/or merged geo and space science controlled vocabularies and the connection of entities in ontology-based domain data model are addressed. The developed controlled vocabularies for the description of geo and space science data and related context information as well as the domain ontologies itself with their domain and cross-domain relationships will be published in Linked Open Data.Graphical abstract Semantic Web based mashup of the earth and space science related Japanese IUGONET, European Union ESPAS and GFZ ISDC data systems and services. Semantic Web based mashup of the earth and space science related Japanese IUGONET, European Union ESPAS and GFZ ISDC data systems and services.


Introduction
One of the main challenges of geo and space science activities is improving our understanding of the complex processes of the earth system including its interaction with solar-driven impacts, such as climate change or space weather.This requires an interdisciplinary approach which connects relevant and related data in the different geo and space science domains.Most of the geo and space domains have mature information models for describing available resources.Discovering available resources in multiple domains is a challenge which requires a level of expertise and knowledge of the individual data systems in each domain.This challenge can be met by the integration of the different geo and space science domains using Semantic Web-based mashup of the appropriate data and models (Allemang and Hendler  2008).
Scientific research has entered the fourth paradigm (Hey et al. 2009) and is more and more real data driven.There is an exponential growing of data (IDC White Paper 2011) with terabytes of data generated daily by sensors, digital models and social networks.This presents another type of challenge for the integration of systems because data are now Big Data (IDC White Paper 2011).This new paradigm has two contrary sides.On the one hand side, scientists are pleased about the potential of using more and more data from different domains, but on the other hand most data are not described and structured in a way for machine-based combination.Furthermore, the tools for finding, accessing and connecting such large amounts of data are not fully available.This challenge can be met by using the Resource Description Framework (RDF) (RDF Working Group 2004) standard as a metadata information model which is used by Semantic Web technology (Allemang and Hendler  2008) to automatically connect data systems and data.Another major reason for doing this research is the fact that standards implementation is at best patchy, and as a result, ontological mediation such as described here can be useful to address deficiencies and variations in quality of standards implementation.
A Sematic Web approach also addresses other related challenges.One is the development of a new culture of cooperative scientific work which is connected through the Web.With a Semantic Web, coherent research collaboratives can be formed that combine data, publications and social networks. 1,2Also, while English is the main language in the field of science, scientific work is a personally organized effort, often discussed and reasoned in the researcher's primary language.This means that researchers use different vocabularies in different languages for the description of their research topics, results, applications and underlying data.Semantic Web technology (Allemang and Hendler 2008; Hebeler et al.  2009; Hitzler et al. 2008) can provide a solution by defining explicit expressions and connecting the different vocabularies using SKOS (W3C 1994-2012), RDFS (Brickley and Guha 2014) and OWL (OWL Working  Group 2012).
In this paper, we mainly describe the GFZ ISDC efforts 3,4 to develop a Semantic Web-based data system using the ISDC ontology network. 5This work is initial part for a planned Semantic Web technology-based connection of the ESPAS (European Commission, Research & Innovation, Research Infrastructures 2014), 6 IUGONET 7 (Abe et al. 2014; Yatagai et al. 2015) and GFZ ISDC 8 data portals (Hapgood and Iyemori 2013).A fruitful collaboration with the University of the Applied Sciences Potsdam, Department of Information Sciences, in research and education forms the basis for this project.
The first activities involving the data modeling tasks for the ISDC ontology were started around 5 years ago.The first version of the ISDC ontology for mapping the information model of the ISDC repository was published in 2010 (Pfeiffer 2010).A Semantic Web-based data portal was developed using Virtuoso Universal Server 9 triple store and Drupal CMS in 2013. 10The ISDC ontology and services were used to form connections to IUGONET, ESPAS and GFZ ISDC resources.

E-science projects-IUGONET, ESPAS and GFZ ISDC
To explore the use of Semantic Web technologies, a proof-of-concept project GFZ ISDC 11 was formed.The goal was to explore how to form a science collaborative using the IUGONET project 12 (Abe et al. 2014; Yatagai  et al. 2015), the European Union ESPAS project 13 and the GFZ ISDC.This chapter describes the main requirements for scientific data systems and explains the background and main goals of the Japanese IUGONET project 14 (Abe  et al. 2014; Yatagai et al. 2015), the European Union ESPAS project 15 and the GFZ ISDC (Ritschel et al.  2008a).

Requirements for e-science infrastructure
The main scientific and technical objectives for e-science or cyberspace projects are to improve the domain specific data management systems and make all resources available on the Web.Often data systems are responsible for sustainable ingestion, storage and provision of data.These systems usually have a specific data use policy.A basic service is to have data catalogs that describe repositories and data harvested from available metadata and context information.These catalogs can be searched for data and metadata and provide methods to access the data either anonymously or through authenticated channels.Some systems offer the publishing of data and the connection of data and publication as value-added services.Such systems are often based on common Content Management Systems (CMS) platforms like Typo3 16 or Drupal. 17Additional value-added services such as moderated user forum or RSS-feed services may be offered.Interoperability between data systems is possible only if the systems are based on the same standards, for example, the same information model and a standardized service.
Additional motivations for open accessibility of data are reproducibility in science and better return on investment from tax-funded research.

IUGONET project
The Japanese Inter-university Upper Atmosphere Global Observation Network IUGONET 18 (Abe et al. 2014; Yatagai et al. 2015) project unifies the efforts of four Japanese universities from Kyoto, Nagoya, Tohuku and Kyushu and the National Institute for Polar Research.Its goal is to design, implement and operate a data system for the enhancement of the provision of mainly upper atmosphere and geomagnetic data.All project partners are responsible for the operation of specific groundbased observatories and instruments which are the basis for the geophysical data within the IUGONET data repository.The leading institution for the design and operation of the IUGONET data system called metadata database (MDB) is the WDC/WDS for Geomagnetism of the Kyoto University. 19The 6-year research project IUGONET started in spring 2009.It is planned to continue the project with the addition of DOI 20 -based publishing of scientific data.

ESPAS project
The Near-Earth Space Data Infrastructure for e-Science ESPAS 21 project was founded by the European Union's Seventh Framework Program.The main objective is the design and implementation of an e-science infrastructure for distributed near-earth space data resources.The project started in November 2011 and will end in November 2015.There are more than 20 partners, mostly scientific institutions from all over Europe.The project is mainly driven by the RAL Space Department of the STFC's Rutherford Appleton Laboratory and the National and Kapodistrian University of Athens including the National Observatory of Athens.The tasks of the participants in the project vary from data provider and information modeler to software developer and system operator.More than 40 existing data repositories covering data from the atmosphere to outer radiation belts were measured by ground-based instruments and also satellites.The data providers mainly contribute metadata to a centralized ESPAS data system 22 which is still in develop-18 http://www.iugonet.org/en/.
22 https://www.espas-fp7.eu/portal/index.html.ment.Beside a catalog service and an access service to selected data, value-added services are part of the planned infrastructure ESPAS (2013).

GFZ ISDC project
The Information System and Data Center ISDC 23 of the Helmholtz Centre Potsdam-GFZ German Research Centre for Geosciences is an operational data portal for geoscientific data with corresponding metadata, scientific documentation and software tools (Ritschel et al.  2008a).The majority of the data and information are global geomonitoring products such as satellite orbit and earth gravity field data as well as geomagnetic and atmospheric data from GFZ-affiliated projects.It includes data from Challenging Minisatellite Payload (CHAMP) low earth orbit satellite, 24 the twin Gravity Recover And Climate Experiment (GRACE) low earth orbit satellites, 25 Global Navigation Satellite Systems (GNSS), 26 Global Geodynamic Project (GGP), 27 Global Geodetic Observing System (GGOS), 28 TerraSAR-X (TSX) 29 and other data associations.

Metadata for IUGONET, ESPAS and GFZ ISDC data portals
This chapter deals with information about the data portals of the IUGONET, ESPAS and GFZ ISDC projects.This includes the metadata, data models and the system architectures used in the ISDC Semantic Web framework.

Metadata formats and data models
Metadata or context data are used for the description of data.Such descriptions contain both information about the data itself, such as content information, start and stop time or spatial coverage of the measurement, and information about entities.It may also include descriptions of resources which are involved in the overall creation process, such as instruments and platforms, persons, institutions and projects.Metadata are also used to document parts of the data life cycle, such as the generation of knowledge in form of scientific publications.Data models, also known as information models, are the basis for system architectures of data systems.For the management of data repositories, underlying concepts and 23 http://isdc.gfz-potsdam.de.
24 http://www.gfz-potsdam.de/champ. 25 relationships of appropriate entities are modeled.There are some standards for geoscience-related metadata and data models, such as DIF standard from NASA (DIF 2013), or ISO 19115 standard for metadata 30 and Observations and Measurements (O&M) data model standard from OGC/ISO. 31 In addition to structural standards for metadata and models, controlled terms or vocabularies are used for keyword-based tagging or indexing of entities.Examples of such vocabularies are the GCMD science keywords from NASA (Olsen et al. 2013) or the "allowed values" derived from the Space Physics Archive Search and Extract (SPASE) standard (King et al. 2010).

IUGONET common metadata format and model
The IUGONET data portal is based on the SPASE metadata and the SPASE data model (King et al. 2010).SPASE is a heliophysics community-based project for the design, implementation and operation of an e-science infrastructure in the heliophysics domain.The corresponding data model is used for the creation of data set descriptions for data collections.Main entities are data resources (numerical data, display data, catalog, granule and annotation), originating resources (observatory, instrument, person and document) and infrastructure resources (registry, repository and service).The SPASE data model specification 32 includes a conceptual ontology, shown in Fig. 1, with the primary implementation as an XML schema.Version 2.0.2 of the SPASE XML schema 33 was the basis for Version 1.0.0 34 of the IUGONET XML schema and the IUGONET common metadata format (Abe et al. 2014). 35In the SPASE data model, all resource entities have a unique resource identifier URI and are described using the XML format.Recently, the IUGONET data model has been extended to include references to ORCID 36 and DOI 37 to enable connections between authors, publications and data.An important part of the metadata and the data model is the use of controlled vocabularies for classification and keyword-based search of entities.IUGONET uses both the SPASE keywords and GCMD science keywords.

ESPAS metadata and data model
The metadata used for the description of ESPAS entities are mainly based on the ISO 19115 standard for 30 http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798. 31http://www.opengeospatial.org/standards/om. 32http://www.spase-group.org/data/dictionary/spase-2_2_2.pdf.(Geographic Information-Metadata). 38The ESPAS data model (ESPAS 2013) uses ISO standards, such as ISO  19101:2002  (Geographic information-Reference model) 39 and ISO 19109:2005 (Geographic information-Rules for creating and documenting application schemas). 40The model is also partly based on the ISO 19156 Observations and Measurements (O&M) standard. 41ore classes or entities of the O&M standard, which are also used for the ESPAS model, are feature of interest, observed property, observation result and designated procedure.In summary, the ESPAS data model version 2.0 consists of following concepts: organization, individual, project, instrument, platform, operation, acquisition process, computation process, composite process, collection and observation.The terminological ESPAS ontology42 provides a controlled vocabulary for the near-earth space domain related to phenomena and observed properties.The terminological ESPAS ontology is modeled using the Semantic Web standard Simple Knowledge Organization System SKOS (W3C 1994-2012) for keyword collections, classifications and thesauri.

GFZ ISDC DIF standard and data model
The design of the operational GFZ ISDC data system43 was based on NASA's DIF metadata standard (Directory Interchange Format (DIF) Writer's Guide 2013), mainly used for the GCMD and appropriate services.The DIF standard includes information about the data sets, such as title, temporal and spatial coverage, quality, access and use constraints, but also about instruments, platforms, projects, persons and data centers.An Entry ID is used for the identification of conforming DIF standard metadata documents.In former versions of the DIF standard, ASCII text was used.The recent version is available as DIF XML schema (Mende et al. 2008).The DIF standard is valid only for a collection of data or data sets called product types.In order to overcome the limitation, the GFZ ISDC derived an enhanced model to include information about granules or data products, such as a unique identifier, temporal and spatial coverage, revision and software version.Figure 2 shows the extension of the main DIF classes which form the ISDC DIF standard.The data model of the ISDC data portal is a relational data model and is implemented using a relational database management system (Ritschel et al. 2008a).The GFZ ISDC data catalog mainly consists of product typerelated tables extended by aggregated tables for enhanced search capabilities.The ISDC metadata documents for product types benefit from the use of GCMD science keywords.

GFZ ISDC: Semantic Web proof of concept
Recognizing both the usefulness of each of the previously described data portals and the complementary nature of their content, we set out on the goal to interconnect the ESPAS, IUGONET and GFZ ISDC data portals.Our analysis showed that while each system used different metadata, conceptually there was a great deal of commonality.The ideal approach to achieving interoperability would be to form a Semantic Web.

Semantic Web stack and standards
From its inception in 1991, the WWW (Lee et al. 1992;  Shadbolt et al. 2006) quickly became the standard infrastructure of the Internet.The World Wide Web Consortium (W3C), 44 with Tim Berners-Lee as its director, is 44 http://www.w3.org/.
the standardization body for the WWW specifications.An implementation of the WWW specifications is commonly referred to as a Web.One of the core WWW specifications is for Unique Resource Identifiers (URIs), or more specific Uniform Resource Locators (URLs), which are used to identify and address documents in the Web.The Hypertext Transfer Protocol HTTP 45 is responsible for the communication within the Web.This application layer protocol connects resources using hyperlinks in HTML documents.This allows HTML documents in the Web to be connected using links.This works exceptionally well, in part because the Web was created for human mind-based interaction.However, there are no explicit semantics of the elements and links of a Web page.
Adding semantics to the Web will allow data to be shared and reused across current boundaries.The technology stack to add semantics is referred to as the Semantic Web. 46he base technology is the Resource Description Framework (RDF) standard (RDF Working Group 2004).For data interchange, the RDF connects Web resources with specific properties which link to other resources or 45 http://www.w3.org/Protocols/.
46 http://www.w3.org/2004/Talks/1117-sb-gartnerWS/slide18-0.html.just literals (strings or numbers).An example is the connection of an author and a book using a triple consisting of subject, predicate and object.Just like in natural language: The author (subject) is Creator (predicate) of the book (object).Each element of the triple may be resources and referenced with a URI.A formal representation or model of knowledge in a real world domain is called an ontology 47 (Gruber 1995).The design of an ontology may be described with RDF Schema (RDFS) (Brickley and Guha 2014) or Ontology Web Language OWL (OWL Working Group 2012).RDFS and OWL extend the features of RDF by the introduction of classes and subclasses, respectively.Subproperties and logical constructs, such as inverse, symmetric, transitive, disjunct and equivalent, provide inference capability based on the first-order predicate logic.Specific elements of OWL, such as "owl:sameAs, " are used to connect entities from different ontologies.Populating an ontology with individuals creates a knowledge base.A knowledge base can be access and queries using the RDF Query Language SPARQL (2008).With SPARQL, individuals can be retrieved and manipulated according to rules defined in 47 http://queksiewkhoon.tripod.com/ontology_01.pdf.
Rule Interchange Format RIF. 48The highest layers in the Semantic Web stack, such as unifying logic, proof and trust, are still in an experimental status and not yet realized.

LOD: Semantic Web application
Linked Open Data LOD (Hebeler et al. 2009)  49 is the most known and a successful project and application in the Semantic Web and is based on the linked data principles defined by Tim Berners-Lee in 2007 (Hebeler et al.  2009; Christian et al. 2009; Berners-Lee 2006).These principles build on the Semantic Web standards and focus on the use and connection of URIs or Internationalized Resource Identifiers IRIs 50 as a way to make statements in RDF expressed as subject-predicate-object triples.Collections of statements can be evaluated and searched using query languages such as SPARQL (2008).When RDF expressions are defined for openly accessible resources, you can define a LOD cloud (Jentzsch et al.  2011).One of the first applications was DBpedia   (Lehmann et al. 2012). 51DBpedia is the Semantic Web counterpart of Wikipedia in the Web.At present, DBpedia contains around 8.8 billion RDF transformed triple of about more than 6 million entities, 52 mainly referencing to the info boxes of Wikipedia.The DBpedia SPARQL endpoint 53 is used to connect DBpedia resources via SPARQL with other RDF resources in LOD.At present, LOD is composed of about 2200 data sets 54 mainly covering the domains of media, geographic, government, publication, cross-domain, life sciences and user-generated content (Jentzsch et al. 2011).In addition to GeoNames 55 and Linked GeoData 56 containing geographical information, there are also resources related to geo and space sciences, such as NASA Space Flight & Astronaut data in RDF 57,58 and related to e-infrastructure projects available, e.g., Linked Sensor Data (Kno.e.sis) 59 in LOD.

Methods for design and mashup of data in the Semantic Web
Structured resources in the RDF format (RDF Working Group 2004) managed by a triple store which include a SPARQL (2008) endpoint are necessary for an efficient mashup of different entities.RDF data reflect the use of entities, such as classes or properties of one or more appropriate ontologies.For enhanced interoperability, it is best to adopt existing ontologies when available.Domain ontologies such as the Semantic Web for Earth and Environmental Terminology SWEET ontology 60 from NASA or the Semantic Sensor Net SSN ontology 61 from W3C 62 are good starting points for the creation of an ontology for a particular domain.There are also terminological ontologies containing controlled vocabularies for the tagging and indexing of resources of the geo and space science domain, such as GEMET (General Multilingual Environmental Thesaurus GEMET 2012).

Modeling the ISDC ontology network
The ISDC ontology (Pfeiffer 2010) was developed according to best practice process models (Noy and  McGuinness 2001).The scope and domain of the ISDC ontology is the conceptual mapping of parts of the data life cycle valid for the objectives of the GFZ ISDC (Ritschel et al. 2008a).For the modeling of the ISDC ontology, both Protégé 3 63 and Protégé 64 4 have been used.

Forming a Semantic Web
The ISDC ontology network is the basic model for the Semantic Web-based GFZ ISDC proof-of-concept 65 implementation.The main ISDC classes and properties are derived from the extended GCMD DIF standard used at the operational GFZ ISDC (Pfeiffer 2010; Ritschel et al.  2012; Ritschel et al. 2008b).This means the core metadata or context information describing the data-ISDC product types and data products-is still compliant to the DIF standard.The ISDC ontology was developed first with the intension to be a one-to-one translation of the ISDC DIF schema (Ritschel et al. 2008b).The main classes are Pro-ductType and DataProduct describing the core context of the data itself.Instrument and Platform classes with information about the sensors and carriers of the sensors, such as observatories or satellites, provide contextual information.Additional classes for Person, Institution and Project are included to provide information of the roles of people, institutions and projects who are involved in the data life cycle.Finally, Publication and Phenomenon classes were added.An important aspect of the ISDC ontology network (Ritschel and Neher 2013) is the ability to connect ISDC ontology classes and properties with ontology entities available in Linked Data (Hebeler et al. 2009) or Linked Open Data. 66Classes and properties from such ontologies, such as FOAF (Brickley and Miller 2014), Bibo (D' Arcus and Giasson 2009) or Geonames, 67 have been linked to the appropriate ISDC ontology entities.For example, "isdc:person owl:equivalentClass foaf:person" connects the ISDC class Person with the appropriate FOAF class.In this process, the core GCMD ontology was taken out of the ISDC ontology and the GCMD classes and properties also have been linked to the appropriate ISDC entities.Figure 3 shows the main entities and relationships of the ISDC ontology network.Most metadata elements of the schema could be transformed into object properties modeling the relationship between classes.For example, "isdc:isCreatedBy" connects individuals of ProductType with Institution (Fig. 4, relationship or property 4) and "isdc:isMeasuredBy" connects ProductType with Instrument (Fig. 4, relationship or property 10).Because the ISDC ontology is modeled in OWL (OWL Working Group 2012), powerful OWL constructs such as "owl:inverseOf" to define inverse features or "owl:transitiveProperty" for the expression transitive features of a property are used.For example, "isdc:isMeasuredBy owl:inverseOf isdc:measuresDataFor" expresses that the property isMeasuredBy is the inverse of the property measuresDataFor.When used to describe that a Product Type "is measured by" the Instrument, there is a corresponding inverse relationship that asserts that the Instrument "measures data for" the Product Type.
In addition to the data life cycle concepts, terminological ontologies have been modeled and included into the ISDC ontology network 68 (Ritschel and Neher 2013).Again the DIF standard plays an important role.SPASE and other organizations which are providing controlled 68 http://rz-vm30.gfz-potsdam.de/ontology/isdc_1.4.owl.
vocabularies for the indexing of entities are also included.Similar to the Parameters field of the ISDC DIF metadata documents containing controlled terms from the GCMD earth science keywords document (Olsen et al. 2013), these keywords are used as a controlled index in the ISDC ontology network.For the use of the GCMD keywords at the ISDC ontology network, the hierarchically structured science keywords have been modeled as concepts with appropriate relationships (properties) and translated into SKOS. 69In a similar process, the SPASE "allowed values" have been classified and the hierarchically related concepts assigned to the appropriate SKOS concept schemas. 70In addition to GCMD and SPASE keywords, the SKOS version of the GEMET (2012) (General Multilingual Environmental Thesaurus GEMET 2012) vocabulary designed and controlled by the participants of the European Environment Agency was added to the ISDC ontology network.

Transforming GCMD's science keywords and SPASE "allowed values"
The team of the Global Change Master Directory from NASA has developed different controlled vocabularies covering the geo and space science domain, as well as geographical and specific data parameters aspects (Olsen  et al. 2013).For the use within the Semantic Web approach, these vocabularies have been transformed into RDF data using the SKOS standard (W3C 1994-2012).Hierarchical relationships between keywords (SKOS concepts) have been translated into transitive semantic relations such as "…skos/core:broader" and "…skos/ core:narrower." For example, "concept#Atmosphere skos/ core:narrower concept#Atmospheric Chemistry" expresses that "Atmospheric Chemistry" is a narrower concept of an "Atmosphere." To become independent from the notation of terms, and for future multilingualism, an independent decimal classification system has been introduced to link to the terms of the vocabulary.The English notation of the term is kept in the annotation property field "prefLabel, " whereas the definition or explanation of the terms related to the specific domain of the vocabulary is documented in the annotation property field definition (Ritschel and Neher 2013). 71he SPASE schema (King et al. 2010) 72 provides various enumeration lists and appropriate concepts for different elements.These elements are related to a specific domain, such as instrument type and measurement type or observatory region and observed region.Some enumeration lists are even hierarchically structured, such as observatory region and observed region, as demonstrated in Fig. 5.The idea to transform these lists as part of a controlled SPASE vocabulary into the SKOS format was realized by mapping such schema elements which are related to an enumeration list to an appropriate SKOS concept schema.For example, SPASE schema element "instrument type" was mapped to the SKOS concept schema Instrument Type.The list of values then became SKOS concepts of the appropriate SKOS concept schema.Again SKOS object properties reflecting broader or narrower relationships are used for the mapping of the 71 http://isdc.gfz-potsdam.de/ontology/gcmd_science.skos.rdf. 72http://www.spase-group.org/data/dictionary/spase-2_2_2.pdf.

Mapping and merging of domain and terminological ontologies with the example of SPASE/IUGONET, ESPAS and GFZ ISDC ontologies
Mapping and merging are techniques for the semantic integration of different domain and terminological ontologies (Allemang and Hendler 2008; Hebeler et al. 2009;  Hitzler et al. 2008).Specific OWL constructs provide the capability for the mapping or merging of entities, such as classes or properties.Such OWL properties are sameAs, equivalentClass or equivalentProperty.The semantic similarity or the semantic distance of classes, properties or individuals of different ontologies is the key to semantic integration.The estimation of the semantic similarity of entities was done for the SPASE/IUGONET and GFZ ISDC domain ontologies (Schildbach 2013).If you compare the object properties for the relationship between data and instrument in the SPASE and GFZ ISDC ontology, the value of the semantic similarity is 0.81, as shown in Fig. 6.In this case, you can reason the object property "spase:isDataOf " is very similar to the appropriate property "isdc:isMeasuredBy." The connection of these properties can be done using the OWL constuct "owl:equivalentProperty" (Schildbach 2013).
A similar approach can be used for the connection of concepts of terminological ontologies.Using a lexical analysis, the comparison of the similarity of strings or substrings of concepts can help to estimate the semantic similarity of the concepts.Stemming and the extraction of term signatures of concepts before the string comparison increase the equivalence assumptions.A structural analysis of the terminological ontology comparing parent and child concepts also improves the process of the ontology mapping/merging.Figure 7 shows a simplified process model of the merging of two vocabularies.The terminological ontology derived from the SPASE/IUGOENT schema 74,75 and the GCMD science keywords ontology 76 developed for the GFZ ISDC Semantic Web have been mapped and merged 77 (Kneitschel 2013).In this case, an automatic procedure for performing a lexical analysis, adapted for use with ontology mapping, detected 23 "equal" concepts.But only 14 concepts of the different ontologies had a real semantic similarity for the use of the SKOS construct "closeMatch." Examples are the concepts Atmosphere, Corona and Electric Field (Kneitschel 2013).The small number of semantic equal concepts comes from the small overlap or intersection of the terminological ontologies or controlled vocabularies SPASE/IUGONET and GCMD science keywords.

System architecture, frameworks and services
The next step was to use the ISDC ontology in an operational system.In a complete system, the system architecture describes the components and relationships between the components and subcomponents as well as the interfaces between components and the available API.This process begins with a functional view of the system architecture which is defined by use cases that describe each workflow.This leads to a logical view of the system architecture which is the basis for design decisions related to software implementation and hardware platforms.With a logical view of the system, it is possible to define or select a framework as the software development environment.
To determine an appropriate ISDC Semantic Web system architecture, we looked at the system architecture for our selected data portals.The overall system architecture-seen from a global scope-is very similar for the IUGONET, ESPAS or GFZ ISDC data systems.Each system architecture is layered and service oriented, consisting of the following main components: data sources, data registration, data access, harvesting and transformation, indexing and catalog ingestion, catalog search and data download.Some portals also have value-added services, such as visualization or statistics.

IUGONET platform
The IUGONET data system is built upon the open source platform DSpace 78 for the creation and management of digital repositories.Resources are described using the IUGONET/SPASE data model, 79 expressed in XML with the XML documents managed by DSpace. 80New resources and documents can be registered, and every single resource entity is referenced by a unique identifier.Data search and access capabilities are implemented and reflected in the GUI of the data portal.

ESPAS platform
The system architecture of the ESPAS data system 82 is service-oriented architecture (SOA), as shown in Fig. 8.For the integration of distributed resources and applications, XML, SOAP, REST, UDDI and WSDL technology is used (ESPAS 2013).The ESPAS data system is based on the D-NET framework 83 for the construction of digital data infrastructures.The D-Net framework provides services for data mediation, data mapping, data storage and indexing, data curation and enrichment, and data provision.After an authorized registration of distributed ESPAS resources, appropriate XML metadata documents are harvested using OAI-PMH 84 mechanism.The implemented OGC Catalog Service OGC CSW 85 connects ESPAS data provider and the centralized catalog of the ESPAS data repository over the Web.The OGC CSW catalog service also provides search capabilities.A new version of the ESPAS data system, 86 demonstrating the main features, is available on the Web.

GFZ ISDC platform
The operational GFZ ISDC 87 was developed using the open source PostNuke CMS and portal framework. 88In order to adapt the functionality of the PostNuke framework to the requirements of a data system, unnecessary components were removed and others were added (Ritschel et al. 2008a).ISDC/DIF metadata extracted from the ASCII and/or XML documents and stored in relational database which is the foundation for the GFZ ISDC data catalog (Mende et al. 2008).Unique identifiers also stored at the catalog are used to reference all granules in the data archive of the ISDC system.Main components of the current GFZ ISDC data system are proprietary and therefore not ready for interoperability.

GFZ ISDC: Semantic Web-based proof-of-concept platform
After evaluating the selected data portals, we selected the open source CMS Drupal 7 89 and the Virtuoso Universal Server 90 for the backbone of the Semantic Web-based 87 http://isdc.gfz-potsdam.de. 88http://www.pn-cms.de/.
89 https://drupal.org/. 90http://virtuoso.openlinksw.com/.Fig. 7 Merging workflow of two thesauri with selected process steps.Such steps are pre-integration, analysis, disambiguation, restructuring and integration and finally evaluation.This figure is taken from the Gregor Kneitschel's Bachelor of Art thesis (Kneitschel 2013)  GFZ ISDC data server. 91Virtuoso is used for the RDF data management providing a triple store and SPARQL endpoint, in our case the management of the GFZ ISDC knowledge base consisting of the ISDC ontology network (OWL file) 92 and appropriate individuals (RDF data).The complete business logic of the Semantic Web-based ISDC data server is implemented in Drupal 7. The RDF triples of the GFZ ISDC knowledge base are imported from Virtuoso and indexed by an Apache Solr index server. 93The individuals and appropriate relationships of the ISDC ontology network including the terminological ontologies are visualized in the GUI of the Drupal system.Drupal also provides a SPARQL interface (SPARQL  2008) for the connection of ISDC entities with external resources in Linked Open Data (LOD).In order to answer the question why we made the choices and how Drupal 94 and Virtuoso 95 compare to other alternatives, such as Apache Jena framework (Apache Software Foundation 2011-2014), we refer to Christoph Seelus's Bachelor of Art thesis about Sementic Web CMS for scientific data management (Seelus 2014).The thesis focuses on the development of an evaluation procedure for the 91 http://rz-vm125.gfz-potsdam.de/drupal/. 92http://rz-vm30.gfz-potsdam.de/ontology/isdc_1.4.owl.
comparison of Semantic Web CMS including appropriate data storage management systems and the subsequent use of this procedure for the features of well-known Semantic Web CMS.Beside Drupal, 96 DSpace, 97 Semantic MediaWiki, 98 OntoWiki 99 und Ximdex 100 were evaluated.In addition, the Semantic Web Frameworks Apache Stanbol, 101 Erfurt SWF 102 and OpenRDF Sesame 103 were proofed.Without going into details, the procedure focuses on requirements and performance indicators, such as technology and system requirements, content and user management, security and software ecosystem, and especially Semantic Web features including knowledge representation, queries and rules.The results of the evaluation clearly show that none of the currently available and tested systems really can meet professional user's requirements regarding functionality and ecosystem.Only Drupal and with a lower degree DSpace 104 and Semantic MediaWiki achieve satisfactory results.

User interfaces and services
Graphical user interfaces and APIs for inter-machine communication are necessary for the interaction with the data systems.Such interactions include data search and catalog browsing but also data access and data download.System interoperability mainly depends on the underlying data model and also depends on API functionality.A survey of the user interfaces and APIs for the selected data portals helped to inform the selection for the ISDC Semantic Web portal.

IUGONET system interfaces and services
The IUGONET data system provides a simple but efficient GUI to the end users. 105Correspondent to the data model, metadata are searchable related to resource types but also using temporal and spatial coverage data or keywords from the controlled SPASE and GCMD science keyword vocabulary.Value-added services, such as data analysis, are realized using IUGONET Data Analysis Software (UDAS). 106

ESPAS system interfaces and services
The ESPAS data system offers GUI-based services and APIs for data providers and end users. 107New data resources can be registered entering the metadata according to the data model.Web-based harvesting mechanism automatically ingests metadata of observations and measurements from the different distributed data providers.A qualified search for data is realized using the GUI of the ESPAS data system.

GFZ ISDC system interfaces and services
The operational GFZ ISDC provides not only the search for data but also the access and download of data files.The system also manages the documents necessary for the use of the data.The portal GUI only provides a search for data products of a specific product type for end users. 108There is no search across all product types which may be available in the ISDC data repository.A proprietary API provides a machine-based request for data.All requested data are delivered from the ISDC archive to end user-specific directories.

GFZ ISDC: Semantic Web-based proof of concept
Ideally the user interface and capabilities of the ISDC Semantic Web should encompass all the capabilities of the selected data portals.We found that the RDF capabilities of Drupal 7 provide a GUI for the interaction with the 105 http://search.iugonet.org/iugonet. 106http://www.iugonet.org/en/software.html. 107https://www.espas-fp7.eu/portal/index.html.
Semantic Web-based GFZ ISDC data system. 109Search for data-related context information is ontology class based and enhanced by the use of controlled vocabulary terms.Context-dependent DBpedia data (Lehmann et al.  2012) from LOD are automatically requested and visualized, such as DBpedia information about institutions.Open street map data are used for the geographical referencing and visualization of search results.The graphical user interface of the ISDC GFZ is shown in Fig. 9.
At present, the Virtuoso Universal Server 110 and the Drupal 7 CMS 111 -based GFZ ISDC-Semantic Web-based proof-of-concept data server 112 only contain a limited number of entities of the GFZ ISDC repository.The knowledge base consists of the ISDC ontology network, version 1.4 113 and appropriate individuals.Most RDF data are related to the gravity field of the earth measured by superconducting gravimeter but also related to the atmosphere and ionosphere derived from GPS measurements, and related to the geomagnetic field from CHAMP satellite magnetometers.These data are linked with RDF data about instruments and platforms, and also persons, institutions, projects and geophenomena.SPARQL queries are used for the connection of known resources with DBpedia 114 information for institutions, instruments, platforms and geophenomena.In addition, Linked GeoData 115 from LOD is used for a visual representation of geographical information for institution and platforms.The SKOS ontology of the GCMD science keywords 116 uses concepts for the tagging of product types and geophenomena.A substantial retrievable publication collection mainly about earth gravity research is also included of the GFZ ISDC Semantic Web. 117

Conclusion and future work
By combining and integrating Semantic Web approaches, appropriate Web standards and LOD data, the resulting approach has the potential to play an important role in meeting the challenges of interoperability and sharing in the geo and space science domains.
Prior to the development of the GFZ ISDC Semantic Web, there was no common and unique interoperable e-science infrastructure available to connect the Japanese IUGONET, 118 European Union ESPAS 119 and GFZ ISDC 120 data portals.We found that while each of the data portals had different data models, there were similarities of concepts.Also each system was built on a different software framework making interoperability difficult at the API level.We found that the most promising approach to achieving interoperability was to use Semantic Web-based technology.A transformation of XML schema into OWL models is possible, 121 and with lexical analysis of definitions for terms, the semantic similarity can be quantified.By storing the metadata transformed into RDF triples in appropriate databases, 122 we were able to achieve cross-system queries and reasoning.This enables the integration of multiple domain 118 http://search.iugonet.org/iugonet. 119https://www.espas-fp7.eu/portal/. 120http://isdc.gfz-potsdam.de.
ontologies and, through references, access to the appropriate data servers.This was fully demonstrated using the SPASE/IUGONET and GFZ ISDC ontologies. 123,124he next important step in the realization of a Semantic Web-based e-infrastructure is the real integration of mapped or merged terminological ontologies into the data server of the involved projects.The installation of triple stores and SPARQL endpoints provides a query-based connection to the distributed and different data resources.It is planned to publish the terminological ontologies and the mapped parts in LOD.In order to overcome the limitations of Drupal 7, 125 especially to avoid the broken links which can occur between the CMS and the triple store Virtuoso Universal Server, 126 other CMS supporting Semantic Web technology, such as Ontowiki 127 and Semantic MediaWiki,128 was validated for the use as a possible framework for the GFZ ISDC-Semantic Web data server (Seelus 2014), as shown in Sect."GFZ ISDC: Semantic Web-based proof-of-concept platform".There is also a collaboration project with the University of Applied Sciences, Department of Information Sciences, based on the GFZ ISDC 129 for the integration of unstructured data in the Web, such as publications derived from data of the GFZ ISDC repository using entity recognition and text of speech tagging methods.Further planed activities including the validation and usage of the recently published Open Semantic Framework OSF 130 for the management of the IUGONET data repository will also focus on the efficiency of the ontological approach and a performance comparison between appropriate relational database management systems and triple stores.
The main result from this work shows that the Semantic Web, with multilingual terminological ontologies, can establish a new collaborative science culture in the Web age.

Fig. 2
Fig. 2 Main classes and elements of the extended ISDC DIF data model.The cyan colored elements are taken from NASA's DIF standard; the yellow and green colored one are ISDC extensions to this standard.The figure is taken from Sabine Pfeiffer's Master of Engineering Thesis (Pfeiffer 2010) 70 http://isdc.gfz-potsdam.de/ontology/spase_keywords.owl.

Fig. 3
Fig. 3 ISDC ontology network.The network is composed of the ISDC core ontology and appropriate individuals, connected with further domain and terminological ontologies

Fig. 4
Fig. 4 Object properties reflecting the relationships between main classes of the ISDC core ontology.Also shown are the corresponding inverse properties.The small numbers below the property names are cardinalities

Fig. 5
Fig.5Transformation of the SPASE "allowed values" as controlled vocabulary into the SKOS standard.Shown is the example of the concept schema "Observatory Region" and appropriate concepts

Fig. 6
Fig.6Particular result of the estimation of semantic similarities of the SPASE and ISDC domain ontologies.Shown are the similarities of the properties of the "spase:Data" and "isdc:ProductType" classes.This figure is taken from Susanne Schildbach's Bachelor of Art thesis(Schildbach 2013)

Fig. 8
Fig.8ESPAS architecture overview based on service-oriented architecture principles.This figure is taken from the "ESPAS, the near-Earth space data infrastructure for e-Science"(ESPAS 2013)