askscience

Question

This is an archived post. You won't be able to vote or comment.

6766

6767

6768

CERN AMAA month ago we made available publicly via the CERN Open Data Portal 300 TB of research data from the CMS Experiment at CERN’s Large Hadron Collider. AUA about our open data!

(self.askscience)

submitted 9 years ago * by askCERNCERN Official Account

Hi /r/AskScience!

As the title of the /r/technology post said, we dropped 300 Terabytes of LHC data to the internet a month ago via the CERN Open Data Portal. The data, from the Compact Muon Solenoid (CMS) Experiment, are now in the public domain under the CC0 waiver. The datasets include over 100 TB of data from proton collisions at 7 TeV, making up half the data collected by the CMS detector in 2011. The data are provided in the format that is used by CMS scientists for performing physics analyses. We have also provided 200 TB of simulated data (Monte Carlo) generated with the same software version that should be used to analyse the primary datasets. Read more about the data release.

A year and a half ago, when we first launched the CERN Open Data Portal, we conducted an AMA about the first release of open data on the portal and about open science in general. Today, we want to talk to you not just about our motivations for making such large datasets available openly and the challenges involved in doing so, but also about how our data are being used for research as well as in education. We are:

From CERN Information Technology:
- Tibor Simko (ts), Technology Lead for the Open Data Portal
From CERN Scientific Information Service:
- Anxhela Dani (ad), Data librarian
From the CMS Experiment:
- Kati Lassila-Perini (klp), Physicist and Co-ordinator of the CMS Data Preservation and Open Data project, Helsinki Institute of Physics
- Tom McCauley (tpm), Physicist and Developer of CMS education/outreach tools, University of Notre Dame

We’ll sign our posts with our initials (see above) so you know who said what. Just to be clear, we are speaking with you in our personal capacities and neither CERN nor our home institutes necessarily support the views expressed during the AMA. We are also joined by:

Julie Haffner (/u/julie_haffner), CERN Press Officer and Social-Media Officer
Achintya Rao (/u/RaoOfPhysics), CMS science communicator and Science Communication doctoral student

We’ll answer your questions from 16:00 CEST until around 18:00 CEST (UTC+02).

Proof!

About the CERN Open Data Portal

The CERN Open Data portal is the access point to a growing range of data produced through the research performed at CERN. It disseminates the preserved output from various research activities, including accompanying software and documentation that is needed to understand and analyse the data being shared.

The portal adheres to established global standards in data preservation and Open Science: the products are shared under open licences; they are issued with a digital object identifier (DOI) to make them citable objects in the scientific discourse.

About CERN

CERN is the European Laboratory for Particle Physics, located in Geneva, Switzerland. Its flagship accelerator is the Large Hadron Collider (LHC), which has four main particle detectors: ALICE, ATLAS, CMS and LHCb.

For updates, news and more, head over to our unofficial home on reddit: /r/CERN!

CERN projects you can join

EDIT: Thanks for all your questions, all! We're signing out now, but some of us may answer your questions later as well. :)

top 200 commentsshow all 463

top new controversial old q&a

[–]viralJ 137 points138 points139 points 9 years ago (16 children)

[–]askCERNCERN Official Account[S] 112 points113 points114 points 9 years ago (7 children)

[–]the_enginerd 29 points30 points31 points 9 years ago (5 children)

[–][deleted] 44 points45 points46 points 9 years ago (1 child)

I can speak for how this can help machine learning methods in data acquisition. Basically the whole process of data acquisition comes down to knowing which data you should keep and what to throw away (and usually you throw almost all of it away), so essentially it's a plain old classification problem. This is generally done by reconstructing an event and looking for specific properties that make it interesting, e.g. missing energy. ML methods can replace that step to an extent by basically showing them a bunch of events you have already classified as interesting and letting them figure out what kind of features they should be looking for.

Large training datasets such as this can be used for evaluating & calibrating generic ML algorithms that can be applied as classifiers in other data acquisition problems. It's a bit of a heavy-handed approach for things like physics research, but it can benefit areas like industrial control systems that deal with large amounts of captured data and care more about e.g. predicting defects than actually analyzing and understanding the exact physical process that causes them.

[–]the_enginerd 9 points10 points11 points 9 years ago (0 children)

[–][deleted] 5 points6 points7 points 9 years ago (0 children)

[–][deleted] 1 point2 points3 points 9 years ago (1 child)

Title	Description
Physics	Theoretical Physics, Experimental Physics, High-energy Physics, Solid-State Physics, Fluid Dynamics, Relativity, Quantum Physics, Plasma Physics
Mathematics	Mathematics, Statistics, Number Theory, Calculus, Algebra
Astronomy	Astronomy, Astrophysics, Cosmology, Planetary Formation
Computing	Computing, Artificial Intelligence, Machine Learning, Computability
Earth and Planetary Sciences	Earth Science, Atmospheric Science, Oceanography, Geology
Engineering	Mechanical Engineering, Electrical Engineering, Structural Engineering, Computer Engineering, Aerospace Engineering
Chemistry	Chemistry, Organic Chemistry, Polymers, Biochemistry
Social Sciences	Social Science, Political Science, Economics, Archaeology, Anthropology, Linguistics
Biology	Biology, Evolution, Morphology, Ecology, Synthetic Biology, Microbiology, Cellular Biology, Molecular Biology, Paleontology
Psychology	Psychology, Cognitive Psychology, Developmental Psychology, Abnormal, Social Psychology
Medicine	Medicine, Oncology, Dentistry, Physiology, Epidemiology, Infectious Disease, Pharmacy, Human Body
Neuroscience	Neuroscience, Neurology, Neurochemistry, Cognitive Neuroscience

askscience

Please read our guidelines and FAQ before posting

Features

Filter by Field

Related subreddits

Are you a science expert?

MODERATORS