1st Keynote : No (Open) Science without Data Curation: Five lessons from the study of Data Journeys (Sabina Leonelli)  

--> Open Research as an opportunity, including scientific infrastructures, governance and how this should be credited and disseminated:   https://www.datastudies.eu/publications https://icsu.org/cms/2017/04/open-data-in-big-data-world_long.pdf

Open Research on three aspects:
  1. Global Scope
  2. Systemic Reach
  3. Local Implementation
FAIR data improves your research at many level. BUT their are requirement to make data FAIR, such as :
Awareness of Open Science and its tools is still very low in the scientific community (EU Working Group on Educaiton and Skills under Open Science, 2017)
https://www.garnetcommunity.org.uk/sites/default/files/GARNet_Paper_nplants201786-1.pdf
https://www.datastudies.eu/publications 
It is important for researcher to have a bit of knowledge of the tools/methods to make their data FAIR. The most important thing, is that some people - us - can share with them an expertise about these tools/methods, and help ease the confusion that the researchers might feel while putting in practice FAIR.
Focus on qualitative data:
  1. Databases, example of plant science
  2. Data Re-use cases 
Data journey example : TAIR (not FAIR) https://www.arabidopsis.org/
  1. preparing specimens
  2. preparin gand performing imaging
  3. data storage dissemination
  4. .... 
  5. ...
  6. Analysis
Epistemic troubles :
- RD collected represent highly selected data types
- selection basesd on political-economic conditions of sharing
- peer reviews structure unclear
- misalignement between it and research need
- no sustainable plans for maintenance
- ....
Lessons Learnt on a general field
  1. Context specific data curaiton is key to data re-use
  2. Long-term maintenance is key to trustowrthiness (update, LT Policy)
  3. Which data and why?
  4. data & materials (connect digital data with data in the physical world)
  5. Role of ethics, humanities & social sciences in data management (increase quality and reusability)
http://press.uchicago.edu/ucp/books/book/chicago/D/bo24957334.html
--------------------------------------------------
-------------------------------------------------

PLENARY : RESEARCH PAPERS

Measuring FAIR Principles to Inform Fitness for Use

Carolyn Hank from University of Tennessee

past paper on  http://datacurationprofiles.org/ => 10.1002/pra2.2016.14505301046
"Fitness for use" > focus on the "reusable" aspect of FAIR
Method : interview
Job-related demographics with questions such as, 'what is your current job title?', 'how many years have you work in this instition?, 'how many have you been work in the discipline?' etc.
Findability >'how did you find the data?', 'DOI', 'metadata?'
Accessibility > 'How did you access the data?' 'Open format?' 'was the data free?' 'was the metadata accessible?'
Interoperability>'was the data in a useable format' 'encoded?' 'machine-actionnable?'
Reusability>'were the metatadata sufficient ?' etc.
Potential implications : data can be FAI, but R requires more research
=> create ne knowledge of how scientists access and use data
=> producing a framework to enable re-use
----------------------------------------

Giving datasets context : a comparison study of institutional repositories that apply varying degrees of curaiton

(Amy Koshoffer, Cincinnati, USA) 

questions :
1. How do the metadata vary for each insittution?
2. completeness of metadata
3. curated datasets do have more documentation
4. DOIs more with curated datasets
5. keywords
What is curation?
- appraisal/selection
- check/run files : include clode review, review sensitive information, merde elle parle trop vite!
4 universities : one repository per institution
20 datasets per repository. Comparaison with mandatory mData / unmandatory for each university
Results
Question 1:  
- all universites use title in metadata (for instance), but all of them understand something different
- all datasets had above the minimum metadata required
Question 2:
- 53% completedness, but different for every institution she looked at
- use of the Mann-Whitney U test : https://fr.wikipedia.org/wiki/Test_de_Wilcoxon-Mann-Whitney
- no optional use of supplementary metadata in curated and not curated repositories
--> Does curation really have an impact then? note sure if the curation service does.
Question 3:
- curation does have an impact on documentation
Question 4:
- all support DOIs, but in different ways. They might be other factors to take into account than curation process
Conclusion :
- Curation process may have had a measurable impact, BUT more factors may be impactful
- Curation > more documentation & more readme.txt
---------------------------------------------------------------------------------

Complexities of digital preservation in a virutal reality environment, the case of virtual Bethel 

University of Indianpolis,  Angela Murillo

"CHUUUUUUUUURCH!"
E. Blumer, 2018
Creation of a VR space for a churche > the question is : how do we preserve this VR space ?
-  at the time, there was an archive (docuemnts/physical objects) but recently the building was sold
- 3D virtual space of the church + learning space (history of the building)
https://comet.soic.iupui.edu/bethel/
Preservation challenges :
- nature of 3D data
- VR operation
Types of data : pre-prod / prod / post-prod + files that make links between the three phase
Updates : 40gb to 60gb (mainly for the creation of learning spaces)
The problem is that until now, there are no VR object preservation framework
Use of NDSA Standard for Levels of Digital Preservation
http://ndsa.org/activities/levels-of-digital-preservation/
Essentialy : Work in progress .... progress ... progress.... progress... progress...
-------------------------------------------------------------------------
--------------------------------------------------------------------------

PARALLEL SPEAKS

ENABLING AND MEASURING FAIR (Fantin)

Are research data sets FAIR in the long run -

Dennis Wehrle, Freiburg 

Spoiler alert : there's no definite answer to this question
Pick 10 public rep. through the 1800+ Re3data.org
For each of these 10 rep. , selection of 10 datasets
Limitation for the datasets :
- Open
- etc. (too fast)
Use of Havard's File Information Tool Set (FITS), which contains 12 analysis tools.
Test dataset : 237 GB to analyse, which represent 5h20 of processing : it represent 85 days of processing for the whole sample (100 datasets), so they took shortcuts (too fast to note)
FITS result : no result / single result / conflicting result / unkown result
Aggregation of identical named format
Unification of "unknown result", still there were 28 conflicts (2150 files) to post-process
In the end : app. 145 formats identified (lower estimation) - a few files were still unidentified
images : png/jpg
Text encoded format : CSV, XML, RTF, HTML
Script/source code : readable with text editor, base64-encoded in XML, JavaScript in HTML, refereence to external dat in (X)html
Problematic "text files" : unown binaries, matlab, SPSS, OCtet Stream
SUSTAINABILITY :
Formats division :
- high probability (plain text/pdfa)
- medium probability (open formats such as OPEN Office)
- low prob (.doc, prioritary formats)
Applied from data format to datasets : Most of the datasets had LOW PROB (3/4 approx.)
Advice to datasets creators : change their format
Result : single file format migration may not be sufficient. As a matter of fact, most of datasets are heterogeneous.
Lesson learnt :
-Data service shouldn't refuse "bad file format" (poorly ranked one), but help researchers create workflow to embed them in LT preservation process. Involvment of datasets creator is necesserary.
- tools mentionned (FITS) have weak support

Enabling FAIR Data in the Earth and Space Sciences

Shelley Stall, American Geophysical Union

Agu position of data tends to respect FAIR principle
Survey was taken, the top for issues are the following without surprise :
- data complexity
- findingrelevent existing data
- TOO
- FAST
Storytelling : a student had his computer stollen, the data  was only on it. Later the publication was retracted because of this (because of the fact the data weren't deposited anywhere)
A new funder's grant is taking place : to get it, your data has to be FAIR
AGU service :
- streamline data policies
-help researchers find support
- dmp support
- etc.
Data Management Traing Clearinghous : bit.ly/DMTC_events :  http://dmtclearinghouse.esipfed.org/ (not AGU project, but communitary project) => online learning resources.
Face 2 face meeting : rd-alliance.org
Include your organization sstall@agu.org
---------------------------------------------------------
---------------------------------------------------------

PARALLEL SPEAKS

Cross-institutional and national data services (Eliane)

Lisa R. Johnston - Data Curation Network: A cross Institutional Staffing Model for curating research data
- Building the data curation network 
- all universities in USA
Idea: collaboratively sharing data curation staff
- How would we deal with conflicting policy issues?
- What do researchers actually need our help with? Will they care if curation is distributed?
- Can I trust someone else to curate our data? What about quality control?
Start with: 9 institutions (all of them contributing to the curators) , 19 data curators, 1 project cooridnator, 1 program director, 8 DCN representatives, 2 admin leads 
Day 1: Business Meeting
Day 2-3 : Curator Training/Network
Process: Ingest, Appraise and Select, DCN, Facilitate Access, Preserve Long-term
DCN: Review, Assign, CURATE, Mediate, Approve
- Check files and metadata
- Understand and run files
- Request missing information 
- Augment metadata
- Transform file formats
- Evalute for FAIRness 
= CURATE
Assessment:
Making everything available: British Library Research Services and research Data Strategy : Rachel Kotorski, British Library
- New department : everything available "research services"
- change management portfolio
Research data strategy: make research data business as usual (this is not the case at the moment), users will be able to use reserach data via tools
http://blogs.bl.uk/digital-scholarship/2017/08/announcing-the-new-british-library-research-data-strategy.html
Four themes:
This cannot be done alonE!!!
Internaitonal Reserach Infrastructure - funder or partner ? Angeletta Miranda Leggio (ANDS)
Working together with other existing groups
A lot of collaborations on local levels
Funded projects like : Open Access to Marine Data, Open River, PetaJakarta
Do you see ANDS as funder, provider or partner?
Too fluffy ....I do not reall now what to do out of it
-------------------------------------------------------------------------------------------
LUNCH. It was really good.
---------------------------------------------------------------------------------------------

Minute madness

  1. Metro Fun - Train the Trainer R. Schneider (vote 1) 
  2. Of coooooooooose! The Bible holds the answer to eeeevrything!
  3. There goes my time....joggling during RDM trainings.
  4. Data Citation in Social Sciences (à regarder)
  5. FDMentor www.forschungsdaten.org/index.php/FDMentor (more than one minute)
  6. Maredata - uanc apon a taim, thea wea several (best accent!) - RDM Iberia
  7. HODs/rd: research data harvester based on repositories ...Gugeell=Google
  8. Holistic RDM service Hannover
  9. Jisc RDM toolkit for international community (expörts in se field)
  10. Grace : exporing the cost and scalability of reserach data management services (Göttingen) https://www.sub.uni-goettingen.de/en/projects-research/project-details/projekt/grace/  (2 vote)
  11. Building a reserach data management training community (efaluation foorm)
  12. How federated reserach data infrastructure work 
  13. Crosswalk - Resurrecting data back from the dead
  14. Long-tail of data
  15. Agile data eco-system
  16. Data sharing workflow for large datasets with globus (the shortest presenter of the posters)
  17. Dtaa Processing Pipeline - Finnish National Preservation Service (a little bit taller than the speaker before) (à voir)
  18. Defining Library Capacity for Big Data curation (the tallest presenter from all posters) (à voir)
  19. Research data management courses : overview and gap analysis
  20. scientific data science service at Brown University (à voir) 
  21. Springer Nature Research Data Service (äreeund=around) (*buuuuuuh*)
  22. Curriculum RDM at Toronto University
  23. Supporting Open research using KiltHub https://kilthub.figshare.com/
  24. RDM for phd (icecreeeeeeam!!!!!)
  25. Preservation of Canadian reserach data (service model)
  26. scaling up data management services with metadata in gene sequencing (à voir) 
  27. surveying data management practices among neuroimaging researchers (àvoir )
  28. Forsbase (ELLE S'APPELLE ELIANE!!!!!)
  29. New online course, deliver RDM services from DCC
--------------------------------------------------------------------------------------------

Demonstrations (Fantin et Eliane vont dans la même session, car DMPOnline on connaît par coeur)

The Arctic World Archive 
https://www.piql.com/arctic-world-archive/
Piql is a norvegian preservation company
digital vault designed to protet most valuable dat a from wars, cata strophes and cyberattacks.