Theory of Data Life Cycle | Lab Reports Life Sciences

Data Life Cycle Labs

A New Concept to Support Data-Intensive Science

Jos van Wezel, Achim Streit, Christopher Jung,

Rainer Stotzka, Silke Halstenberg, Fabian Rigoll,

Ariel Garcia, Andreas Heiss

Karlsruhe Institute of Technology (KIT)

Karlsruhe, Germany

Achim.Streit@kit.edu

Martin Gasthuber

Deutsches Elektronen-Synchrotron (DESY)

Hamburg, Germany

Martin.Gasthuber@desy.de

Kilian Schwarz

GSI Helmholtzzentrum für Schwerionenforschung GmbH

Darmstadt, Germany

k.schwarz@gsi.de

André Giesler

Forschungszentrum Jülich (FZJ)

Jülich, Germany

a.giesler@fz-juelich.de

Abstract—In many sciences the increasing amounts of data are

reaching the limit of established data handling and processing.

With four large research centers of the German Helmholtz

association the Large Scale Data Management and Analysis

(LSDMA) project supports an initial set of scientific projects,

initiatives and instruments to organize and efficiently analyze the

increasing amount of data produced in modern science. LSDMA

bridges the gap between data production and data analysis using

a novel approach by combining specific community support and

generic, cross community development. In the Data Life Cycle

Labs (DLCL) experts from the data domain work closely with

scientific groups of selected research domains in joint R&D

where community-specific data life cycles are iteratively

optimized, data and meta-data formats are defined and

standardized, simple access and use is established as well as data

and scientific insights are preserved in long-term and open

accessible archives.

Keywords: data management, data life cycle, data intensive

computing, data analysis, data exploration, LSDMA, support, data

infrastructure

I. INTRODUCTION

Today data is knowledge – data exploration has become the

4th pillar in modern science besides experiment, theory, and

simulation as postulated by Jim Gray in 2007 [1]. Rapidly

increasing data rates in experiments, measurements and

simulation are limiting the speed of scientific production in

various research communities and the gap between the

generated data and data entering the data life cycle (cf. Fig1) is

widening. By providing high performance data management

components, analysis tools, computing resources, storage and

services it is possible to address this challenge but the

realization of a data intensive infrastructure at institutes and

universities is usually time consuming and always expensive.

The introduced “Large Scale Data Management and Analysis”

(LSDMA) project extends the services for research of the

Helmholtz Association of research centers in Germany with

community specific Data Life Cycle Laboratories (DLCL). The

DLCLs are complemented with a Data Services Integration

Team (DSIT) which provides generic technologies and services

for multi-community use based on research and development in

the areas of data management, data access and security, storage

technologies and data preservation.

The LSDMA project initiated at the Karlsruhe Institute of

Technology (KIT), builds on the familiarity with supporting

local scientists at a computer center, the knowledge of running

the Grid Computing Centre Karlsruhe (GridKa) [2] as the

German Tier 1 hub in the World Wide LHC Computing

infrastructure [3], the Large Scale Data Facility (LSDF) [4] and

the experience with the very successful Simulation Labs [5]

that specialize at supporting HPC users.

Figure 1. The scientific data life cycle

Theory of Data Life Cycle, Lab Reports of Life Sciences

Related documents

Partial preview of the text

Download Theory of Data Life Cycle and more Lab Reports Life Sciences in PDF only on Docsity!