IPCC FAIR Tutorial for Authors: All in One View

Content from IPCC FAIR Background

Last updated on 2024-10-25 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What are the fundamentals to produce a FAIR IPCC Assessemt Report`?

Objectives

Learn the FAIR priniciples and motivations
Learn about genereal Research Data and Software Management Practices
Learn about story telling and visualisation

FAIR data principles at IPCC

motivation for archiving code, figures, input data and metadata FAIR guidance

The experience in AR6 and shortcomings

including some examples.

Fundamentals of Sustainable Research Software

Software Management Plans eScience

Fundamentals of Research Data Management

The Turing way Tutorial

Models of interactive products:

Storytelling, layered storytelling, static vs dynamic, GIS-based models.

Content from Research Data and Software Managment

Last updated on 2024-11-15 | Edit this page

Estimated time: 11 minutes

Overview

Questions

What is research data?
What is research software?
Why is important to properly describe, protect and share research data and software?

Objectives

Understand the importance of disseminating research data and the code used for its generation
Undertand the benefits of a Research Data Management plan (via the Turing Way)
Understad the difference between research code and software and the benefits of a Software Management Plan

Research Data Management

Climate science has significant public interest, as it affects people’s lives, economies, and ecosystems. Effective Research Data Management supports open science initiatives by making data accessible to the public, policymakers, and other stakeholders, increasing transparency, and encouraging public engagement. This openness builds trust and fosters greater awareness and informed decision-making regarding climate action.

Research Data Management underpins the accuracy, reproducibility, and impact of research findings. It supports collaborative and transparent science. In IPCC it helps ensure that investments in the realisation of the assessments continue to benefit scientific inquiry and public policy.

Callout

“The Turing Way”, an open science and community-driven project focused on making data science more accessible, understandable, and effective, offers a general overview on the purposes and practices that motivates RDM, illustrating guidelines and useful approaches to put that into practice.

Reproducible Research according to the Turing Way

For instance some IPCC Working Groups may propose a Data Management Plan.

Software Management Plans

Before diving into Software Management Plans, it is important to highlight the distinctions between research code and research software

Rresearch Code is the individual, often experimental, coding work that solves specific problems in the research process, It is often a custom solution developed for a specific research question or experiment. For instance the script used to generate one of the figures in the IPCC reports.

Research Software is a broader, often more stable tool or platform that assists in conducting research across various stages of the workflow. Both are critical components of modern research, with research code often contributing to the development of research software. Eg. the ESMVal Tool

Key characteristics of Research Code

Custom and Domain-Specific: It is typically tailored to address the unique needs of a particular research task or domain (e.g., bioinformatics, physics simulations, social sciences).
Prototyping and Experimentation: Often experimental, evolving during the research process as the researcher tests and refines ideas. This could be in the form of scripts for data collection, analysis, or visualization.
Reproducible: In many cases, research code is shared openly to promote reproducibility and transparency. Open-source platforms like GitHub, GitLab, and Bitbucket are commonly used for sharing and collaborating on research code. Scripts my be expresssed as Jupyter Notebooks and re-executed in Jupyter platforms like Jupyter Lab, Jupyter Hub or Binder.

Key characteristics of Research Sofware

Comprehensive and integrated: supporting tasks like data management, analysis, and visualization, often with a user-friendly interface. For instance, tools like SPSS, MS Excel, or Tableau.
Production-ready: stable, and maintainable, featuring error handling and documentation. It can be domain-specific (e.g., statistical tools, simulation platforms) or general-purpose (e.g., text editors, database systems) and is widely used in research.

Callout

Some working groups may consdier to propose a Software Management Plan. This is usually a document that addresses questions such as.

What does it do?
Who is it for?
What resources does it need?
Who is responsible?
What licence does it needs?

Having such clarity early on, avoid problems later, with the objective of facilitatng IPCC to deliver FAIR code and software. Example of SMPs exists in many organisations. A detailed list of elements that are relevant in the defintion of SMP for Research Code and Software is provided by the Dutch insitute for eScience.

In IPCC, SMPs can have the scope to define key aspects which should be taken into account by the authors, depending whether they will develop and release simple scripts, for data analysis and visualisation purposes, or more complex Research Software, like for instance a new IPCC Atlas.

Challenge 1: Can you classify the following software types?

Discussion

Challenge

:::::::::::::::::::::::: solution

Output

::::::::::::::::::::::::::::::::::::::::::::::::

Key Points

Content from AR7 Tutorial

Last updated on 2024-10-25 | Edit this page

Estimated time: 90 minutes

Overview

Questions

How do I generate digital outputs?
How do I describe and curate digital outputs?
How do I transfer to the TSU the digital outputs?

Objectives

Produce data and figures (tooling/programming)
Create and Manage a software repository
Generate Metadata
Obtain a DOI for data
Obtain a DOI for software

The AR6 experience and lessons learnt

Author profiles

spreadsheet sam, notebook nancy, script sandy

Categories and governance of digital IPCC products

From figures to interactive applicatio

IP, credit and ownership

IPCC applications vs external sponsored applications

Review process

Content from Editing Tutorial - Markdown

Last updated on 2024-10-25 | Edit this page

Estimated time: 12 minutes

Overview

Questions

How do you write a lesson using Markdown and sandpaper?

Objectives

Explain how to use markdown with The Carpentries Workbench
Demonstrate how to include pieces of code, figures, and nested challenge blocks

Introduction

This is a lesson created via The Carpentries Workbench. It is written in Pandoc-flavored Markdown for static files and R Markdown for dynamic files that can render code into output. Please refer to the Introduction to The Carpentries Workbench for full documentation.

What you need to know is that there are three sections required for a valid Carpentries lesson:

questions are displayed at the beginning of the episode to prime the learner for the content.
objectives are the learning objectives for an episode displayed with the questions.
keypoints are displayed at the end of the episode to reinforce the objectives.

Instructor Note

Inline instructor notes can help inform instructors of timing challenges associated with the lessons. They appear in the “Instructor View”

Challenge

Challenge 1: Can you do it?

What is the output of this command?

R

paste("This", "new", "lesson", "looks", "good")

Output

OUTPUT

[1] "This new lesson looks good"

Challenge

Challenge 2: how do you nest solutions within challenge blocks?

Show me the solution

You can add a line with at least three colons and a solution tag.

Figures

You can use standard markdown for static figures with the following syntax:

![optional caption that appears below the figure](figure url){alt='alt text for accessibility purposes'}

You belong in The Carpentries!

Callout

Callout sections can highlight information.

They are sometimes used to emphasise particularly important points but are also used in some lessons to present “asides”: content that is not central to the narrative of the lesson, e.g. by providing the answer to a commonly-asked question.

Math

One of our episodes contains $\LaTeX$ equations when describing how to create dynamic reports with {knitr}, so we now use mathjax to describe this:

$\alpha = \dfrac{1}{(1 - \beta)^2}$ becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$

Cool, right?

Key Points

Use .md files for episodes when you want static content
Use .Rmd files for episodes when you need to generate output
Run sandpaper::check_lesson() to identify any issues with your lesson
Run sandpaper::build_lesson() to preview your lesson locally

Content from Lifecycle of figures in the IPCC Reports

Last updated on 2024-10-25 | Edit this page

Estimated time: 10 minutes

Overview

Questions

How do figures and figure submission requirements evolve throughout the cycle ?

Objectives

Provide an overview of the figures life-cycle
Describe evolving requirements for figure submission at the different draft versions

Figures evolve throughout the cycle. At each draft, new figures are created, some are discarded, or combined. At the end of the process, for the publication of the final draft, we wish to collect information on who created the figure, how, and using what data. Storing this information allows figure authors to get credit for their work, and allows other researchers to build on the work of the IPCC, in line with the best practices of open science.

The following lays out instructions for authors on how to organize figure information for submission to the TSU. Requirements are basic for the zero order draft, and increase in comprehensiveness as we move toward the Final Government Draft.

Zero Order Draft

At this point, figures are mostly placeholders. Authors will for example suggest that “here should be a figure showing x,y,z”. There are no expectations of an actual figure being submitted at this stage.

First Order Draft

TODO

Second Order Draft

TODO

Final Government Draft

Here we expect authors to submit - The figure itself - The data used to create the figure, and a reference for each dataset. Note that this data should be as close as possible to what is shown in the figure. If any analysis is required to translate original input data into figure-ready data, then authors should publish this data, get a DOI for it, and reference it in the figure metadata. See TODO. - The code used to create the figure based on the data provided. - Information on the author(s) of the figure - The proposed caption for the figure (???)

Figures adapted to different audiences

Some key figures prepared by chapters are highlighted in the technical summary (TS), and later in the summary for policy makers (SPM). The intended audience for chapters, TS and SPM are of course different, and as a result, figures need to be adapted. This process will be facilitated by the data collection described above.

For example, the figures below show how chapter figure 6.3 and its underlying data was reused to create new figures for the technical summary (TS9), and later the SPM (SPM3).

Chapter figure TS figure SPM figure

Content from Licensing Tutorial

Last updated on 2024-11-14 | Edit this page

Estimated time: 11 minutes

Overview

Questions

What licenses are required for datasets or data products that are used?
What licenses should we apply to created datasets?

Objectives

Understand how to classify source code and data (input, intermediate assessment, final assessment)
Understand recommended licenses for each data type and code

Introduction

The content of this lesson is taken from the recommendations from the IPCC Task Group on Data Support for Climate Change Assessments (Huard et al 2022).

Licensing of IPCC material, with clear and consistent meaning in all legal jurisdictions, is essential to facilitate its appropriate use to address pressing climate change challenges, while protecting the rights of data providers.

Callout

The IPCC reports and data are licensed separately!

IPCC reports are published under a copyright license that prohibits commercial use and the creation of derivative products, unless discussed first and then given permission by the IPCC Secretariat. This license is applied to protect IPCC reports from distortion since these are accepted by member governments, or approved in the case of the Summary for Policymakers, and adopted in the case of the Synthesis Report. If the same license was applied to data products, it would severely limit their usefulness and value. A different IPCC data license is required to allow the creation of derivatives for the pursuit of research and the re-use of IPCC data-based products for national assessments, adaptation and mitigation policies.

Classifying Data Types

TG-Data distinguishes three categories of data: input data, intermediate assessment data, and final assessment data.

Input data denotes the source data that underpins information in the assessment reports. It is typically authored by credible, authoritative, trusted sources, who decide under which license it is published.

Intermediate assessment data is the outcome of data processing and analysis performed as part of the assessment as an intermediate step in the generation of final assessment data. Data is only defined as intermediate if it has gone through non-trivial processing to be considered an original product, distinct from the input data.

Final assessment data refer to data which is directly presented in data tables or graphically displayed (e.g. as a line graph or a spatial map) in the report.

Source code refers to scripts, online code repositories, and software libraries written to create intermediate and final assessed data, as well as the figures included in the reports.

Licenses For Different Data Types

Input data shall be licensed under the same license terms and conditions imposed by the data providers. Input data copyright holders are encouraged to adopt well-known licenses enabling broad usage, including commercial use, and avoid “ShareAlike” licenses.

Intermediate and final assessment data should be licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, where this does not infringe the interests of relevant license holders. The Creative Commons family of licenses are designed to provide legal interoperability across virtually all jurisdictions.

When input datasets are published under restrictive licenses, waivers or exemptions can be sought for the IPCC assessment reports. These waivers should be negotiated with copyright holders by Working Group co-chairs, with guidance from TG-Data representatives.

These waivers would ensure that derivative products can be licensed by the IPCC under CC BY 4.0, and that the version used by the assessment report is curated in a long-term archive, either by IPCC DDC or another trusted data repository. If exemptions cannot be obtained from the copyright owners, the applicable licenses of input data will apply.

To ensure maximal reusability of source code, similarly to data, code should be published under permissive (non- copyleft) open source licenses that do not restrict commercial use.

Challenge

Challenge 1: Can you classify the following data types?

A map used in the report
Output from a CMIP6 model
Model agreement on changes in temperature in a warming scenario

Output

A map used in the report: Final
Output from a CMIP6 model: Input
Model agreement on changes in temperature in a warming scenario: Intermediate

Key Points

Data can be classified into input, intermediate assessment, or final assessment data.
Input data shall be licensed under the same license terms and conditions imposed by the data providers.
Data produced as part of the IPCC assessment, be it intermediate or final assessment data, shall be published, wherever possible, under the CC BY 4.0 license.