Discover and publish cutting edge, open research.

Browse 15,953 multi-disciplinary research preprints

Most recent

Hung Do

and 3 more

Toshiaki Jo

and 1 more

Environmental DNA (eDNA) analysis is a promising tool for non-disruptive and cost-efficient estimation of species abundance. However, its practical applicability in natural environments is limited owing to a potential gap between eDNA concentration and species abundance in the field. Although the importance of accounting for eDNA dynamics, such as transport and degradation, has been discussed, the influence of eDNA characteristics, including production source and cellular/molecular state, on the accuracy of eDNA-based abundance estimation was entirely overlooked. We conducted meta-analyses using 44 of previous eDNA studies and investigated the relationships between the accuracy (R2) of eDNA-based abundance estimation and eDNA characteristics. First, we found that estimated R2 values were significantly lower for crustaceans and mussels than fish. This finding suggests that less frequent eDNA production of these taxa owing to their external morphology and physiology may impede accurate estimation of their abundance via eDNA. Moreover, linear mixed modeling showed that, despite high variances, R2 values were positively correlated with filter pore size, indicating that selective collection of larger-sized eDNA, which is typically fresher, could improve the estimation accuracy of species abundance. Although our collected dataset was somewhat biased to the studies targeting specific taxa, our findings shed a new light on the importance of what characteristics of eDNA should be targeted for more accurate estimation of species abundance. Further empirical studies are required to validate our findings and fully elucidate the relationship between eDNA characteristics and eDNA-based abundance estimation.

Victoria DeLeo

and 5 more

Animal seed dispersers may influence plant genetic diversity, though there are few examples linking disperser behavior to population genomic diversity. We hypothesized that breeding colonies of the frugivorous White-Crowned Pigeon (Patagioenas leucocephala) would increase population diversity and decrease population differentiation in fruit trees at nesting sites due to increased seed dispersal from foraging trips. We measured the density and extent of colonies at Parque Nacional Jaragua (Dominican Republic) and used nuclear and plastid SNPs from ddRADseq to examine the spatial genetic structure of two common species: poisonwood (Metopium toxiferum), a key fruit resource during the breeding season, and gumbo limbo (Bursera simaruba). We found that pigeon nesting aggregations in and around Parque Nacional Jaragua occupy areas between 3 and 5 Km2, with an estimated number of active nests for 2016 extrapolated to 3 km2 of 159,144  21,484 s.e., making this one of the largest breeding aggregations for the species across its range. However, colony locations did not determine tree genetic diversity and differentiation. Gumbo limbo (consumed by a diverse community) showed less isolation by distance than poisonwood. Saplings and plastid markers, expected to be more strongly influenced by seed dispersal, did not display geographic structure associated with colony sites, suggesting that patterns were not primarily due to pigeon foraging. Our results highlight the diversity of population genomic patterns among co-occurring species with similar ecological niches and demonstrate the limitations for frugivores to influence genetic differences among plant species.

Elard Hurtado

and 2 more

In this paper, we establish existence of infinitely many weak solutions for a class of quasilinear stationary Kirchhoff type equations, which involves a general variable exponent elliptic operator with critical growth. Precisely, we study the following nonlocal problem \begin{equation*} \begin{cases} -\displaystyle{M}(\mathscr{A}(u))\operatorname{div}\Bigl(a(|\nabla u|^{p(x)})|\nabla u|^{p(x)-2}\nabla u\Bigl) = \lambda f(x,u)+ |u|^{s(x)-2}u \text{ in }\Omega, \\ u = 0 \text{ on } \partial \Omega, \end{cases} \end{equation*} where $\Omega$ is a bounded smooth domain of $\mathbb{R}^N,$ with homogeneous Dirichlet boundary conditions on $\partial \Omega,$ the nonlinearity $f:\overline{\Omega}\times \mathbb{R}\to \mathbb{R}$ is a continuous function, $a:\mathbb{R}^{+}\to\mathbb{R}^{+}$ is a function of the class $C^{1},$ $M:\mathbb{R}^{+}_{0}\to\mathbb{R}^{+}$ is a continuous function, whose properties will be introduced later, $\lambda$ is a positive parameter and $p,s\in C(\overline{\Omega})$. We assume that $\mathscr{C}=\{x\in \Omega: s(x)=\gamma^{*}(x)\}\neq \emptyset,$ where $\gamma^{*}(x)=N\gamma(x)/(N-\gamma(x))$ is the critical Sobolev exponent. We will prove that the problem has infinitely many solutions and also we obtain the asymptotic behavior of the solution as $\lambda\to 0^{+}$. Furthermore, we emphasize that a difference with previous researches is that the conditions on $a(\cdot)$ are general overall enough to incorporate some interesting differential operators. Our work covers a feature of the Kirchhoff’s problems, that is, the fact that the Kirchhoff’s function $M$ in zero is different from zero, it also covers a wide class of nonlocal problems for $p(x)>1,$ for all $x\in \overline{\Omega}.$ The main tool to find critical points of the Euler Lagrange functional associated with this problem is through a suitable truncation argument, concentration-compactness principle for variable exponent found in \cite{bonder}, and the genus theory introduced by Krasnoselskii.

Mai Nagaoka

and 11 more

Background and Purpose Orally administered ketoconazole rarely induces liver injury and adrenal dysfunction. In cellulo studies showed that a metabolite formed by arylacetamide deacetylase (AADAC)-mediated hydrolysis is relevant to ketoconazole-induced cytotoxicity. This study tried to examine the significance of AADAC in ketoconazole-induced toxicity in vivo using Aadac knockout mice. Experimental Approach Wild-type and Aadac knockout mice orally received 150 or 300 mg/kg/day ketoconazole, and plasma parameters, the concentrations of ketoconazole and N-deacetylketoconazole in plasma and tissues, and hepatic mRNA levels of immune- and inflammatory-related factors were measured. The effects of pretreatment with corticosterone (40 mg/kg, s.c.) on ketoconazole-induced liver injury were also examined. Key Results In a study of a single oral administration of 150 mg/kg ketoconazole, the area under the plasma concentration curve values of ketoconazole and N-deacetylketoconazole in Aadac knockout mice were significantly higher and lower than those in wild-type mice, respectively. With the administration of ketoconazole (300 mg/kg/day) for 7 days, Aadac knockout mice showed higher mortality (100%) than wild-type mice (42.9%), with significantly higher plasma alanine transaminase and lower corticosterone levels, representing liver injury and adrenal dysfunction, respectively. In Aadac knockout mice, hepatic mRNA levels of immune- and inflammatory-related factors were increased by the administration of ketoconazole, and the increase was restored by the replenishment of corticosterone, which shows anti-inflammatory effects. Conclusion and Implications Aadac defects exacerbated ketoconazole-induced liver injury by inhibiting glucocorticoid synthesis and enhancing the inflammatory response. This in vivo study revealed that the hydrolysis of ketoconazole by AADAC can mitigate ketoconazole-induced toxicities.

Sena Sert

and 1 more

IIntroduction Isolated tricuspid valve prolapse (TVP) is a rare finding on transthoracic echocardiography. Right atrial enlargement or prominent ” v ” waves as a consequence of hemodynamic changes in severe tricuspid regurgitation (TR) are rarely seen with isolated TVP. Here is a case of isolated prolapse of anterior tricuspid leaflet presenting with giant C-V waves also known as Lancisi’s sign. Case Report A 66-year-old male presented with increasing exercise limitation and leg edema in recent months and was complaining about the persistent pulsation at his neck and elevated jugular venous pulse with prominent systolic pulsation that represents giant C-V waves, also known as ‘Lancisi’s sign’ consequence of severe TR due to isolated prolapse of the anterior leaflet.The patients’ symptoms resolved completely after tricuspid valve replacement. Discussion TVP is best defined at parasternal short axis view with more than 2 mm atrial displacement (AD) of leaflet/leaflets. TVP can also be detected from four chamber view with more than 2 mm AD or in right ventricular inflow view with more than 4 mm AD. As a consequence of TVP, the physiological jugular venous waveform alters due to severe TR.During severe TR; retrograde blood flow through right atrium during ventricular systole restrains x descent and produces a fusion of c and v waves that appears as a large pulsation in physical examination called as ‘Lancisi’s sign’ Conclusion ‘Lancisi’s sign’ is defined as a large visible systolic neck pulsation as a consequence of the c-v waves fusion by preventing x descent during severe TR.

Mayuko Seki

and 6 more

Biochar application is currently considered to be an effective soil organic carbon (SOC) management to prevent land degradation by enhancing SOC stock. However, quantitative information on the impact of biochar application on carbon dioxide (CO2) flux and associated microbial responses is still scarce, especially in degraded tropical agroecosystems. Here, we evaluated the impact of land management (control (C), biochar (B; 8.2 Mg C ha−1), farmyard manure (FYM) (M; 1.1 Mg C ha−1 yr−1), and a mixture of both (BM; 8.2 Mg biochar-C ha−1 and 1.1 Mg FYM-C ha−1 yr−1)) on CO2 flux, SOC stock, microbial biomass C (MBC), and metabolic quotient (qCO2) in degraded tropical alkaline cropland of southern India, based on a 27-month field experiment. Cumulative CO2 flux over the experiment was 2.4, 2.7, 4.0, and 3.7 Mg C ha−1 in the C, B, M, and BM treatments, respectively. Biochar application increased soil moisture and SOC stock, though did not affect CO2 flux, MBC, and qCO2, indicating the limited response of microbes to increased soil moisture because of small amount of SOC. Combined application of biochar and FYM did not increase CO2 flux compared with FYM alone, due to little difference of microbial responses between the M and BM treatments. Additionally, SOC increment (8.9 Mg C ha−1) and the rate of C-input retention in soil (0.78) was most significant in the BM treatment. Hence, the combined application of biochar and FYM could be sustainable land management by efficient increase of SOC stock in the tropical degraded cropland.

Marta Cortes

and 5 more

Objectives. Neuroblastoma is the most common extracranial tumour in children, and prognosis for refractory and relapsed disease is still poor. Early Phase clinical trials play a pivotal role in the development of novel drugs. Ensuring adequate recruitment is crucial. The primary aim was to determine the rate of participation trials for children with refractory/relapsed neuroblastoma in two of the largest Drug Development European institutions. Methods. Data from patients diagnosed with refractory/relapsed neuroblastoma between January 2012 and December 2018 at the two institutions were collected and analysed. Results. Overall, 48 patients were included. A total of 31 (65%) refractory/relapsed cases were enrolled in early Phase trials. The main reasons for not participating in clinical trials included: not fulfilling eligibility criteria prior to consent (12/17, 70%) and screening failure (2/17, 12%). Median time on trial was 4.3 months (range 0.6-13.4). Most common cause for trial discontinuation was disease progression (67.7%). Median overall survival was longer in refractory (28 months, 95% CI, 20.9-40.2) than in relapsed patients (14 months, 95% CI, 8.1-20.1)) [p=0,034]. Conclusions. Although two thirds of children with refractory/relapsed neuroblastoma were enrolled in early Phase trials, recruitment rates can still be improved. The main cause for not participating on trials was not fulfilling eligibility criteria prior to consent, mainly due to performance status and short life expectancy. This study highlights the hurdles to access to innovative therapies for children with relapsed/refractory neuroblastomas and identifies key areas of development to improve recruitment to early phase trials.

Mitchell Elkind

and 10 more

Objective: To evaluate the cost-effectiveness of insertable cardiac monitors (ICMs) compared to standard of care (SoC) for detecting atrial fibrillation (AF) in patients at high risk of stroke (CHADS2 >2), in the US. Background: ICMs are a clinically effective means of detecting AF in high-risk patients, prompting the initiation of non-vitamin K oral anticoagulants (NOACs). Their cost-effectiveness from a US clinical payer perspective is not yet known. Methods: Using patient data from the REVEAL AF trial (n= 446, average CHADS2 score= 2.9), a Markov model estimated the lifetime costs and benefits of detecting AF with an ICM or with SoC (namely, intermittent use of electrocardiograms [ECGs] and 24-hour Holter monitors). Ischemic and hemorrhagic strokes, intra- and extra-cranial hemorrhages, and minor bleeds were modelled. Diagnostic and device costs were included, plus costs of treating stroke and bleeding events and of NOACs. Costs and health outcomes, measured as quality-adjusted life years (QALYs), were discounted at 3% per annum. One-way deterministic and probabilistic sensitivity analyses (PSA) were undertaken. Results: Lifetime per-patient cost for ICM was $58,132 vs. $52,019 for SoC. ICMs generated a total 7.75 QALYs vs. 7.59 for SoC, with 34 fewer strokes projected per 1,000 patients. The incremental cost-effectiveness ratio (ICER) was $35,452 per QALY gained. ICMs were cost-effective in 72% of PSA simulations, using a $50,000 per QALY threshold. Conclusions: The use of ICMs to identify AF in a high-risk population is likely to be cost-effective in the US healthcare setting.

Browse more recent preprints

Recently published in scholarly journals

Colum Keohane

and 6 more

Abstract Objective To determine whether the introduction of a one-stop see and treat clinic offering early reflux ablation for Venous Leg Ulcer (VLU) patients in July 2016 has affected rates of unplanned inpatient admissions due to venous ulceration. Design Review of inpatient admission data and analysis of related costs. Materials The Hospital Inpatient Enquiry collects data from acute public hospitals in Ireland on admissions and discharges, coded by diagnosis and acuity. This was the primary source of all data relating to admissions and length of stay. Costs were calculated from data published by the Health Service Executive in Ireland on average costs per inpatient stay for given diagnosis codes. Methods Data were collected on admission rates, length of stay, overall bed day usage, and costs across a four-year period; the two years since the introduction of the rapid access clinic, and the two years immediately prior as a control. Results 218 patients admitted with VLUs accounted for a total of 2,529 inpatient bed-days, with 4.5(2-6) unplanned admissions, and a median hospital stay of 7(4-13) days per month. Median unplanned admissions per month decreased from 6(2.5-8.5) in the control period, to 3.5(2-5) after introduction of the clinic p=.040. Bed-day usage was significantly reduced from median 62.5(27-92.5), to 36.5(21-44) bed-days per month (p=.035), though length of stay remained unchanged (p=.57). Cost of unplanned inpatient admissions fell from median \euro33,336.25(\euro14,401.26-\euro49,337.65) per month to \euro19,468.37(\euro11,200.98-\euro22,401.96) (p=.03). Conclusions Admissions for inpatient management of VLUs have fallen after beginning aggressive endovenous treatment of venous reflux in a dedicated one-stop see-and-treat clinic for these patients. As a result, bed-day usage has also fallen, leading to cost savings.

Mohammed Al-Sadawi

and 7 more

Abstract: Background: This meta-analysis assessed the relationship between Obstructive Sleep Apnea (OSA) and echocardiographic parameters of diastolic dysfunction (DD), which are used in the assessment of Heart Failure with Preserved Ejection Fraction (HFpEF). Methods: We searched the databases including Ovid MEDLINE, Ovid Embase Scopus, Web of Science, Google Scholar, and EBSCO CINAHL from inception up to December 26th, 2020. The search was not restricted to time, publication status or language. Comparisons were made between patients with OSA, diagnosed in-laboratory polysomnography (PSG) or home sleep apnea testing (HSAT), and patients without OSA in relation to established markers of diastolic dysfunction. Results: Primary search identified 2512 studies. A total of 18 studies including 2509 participants were included. The two groups were free of conventional cardiovascular risk factors. Significant structural changes were observed between the two groups. Patients with OSA exhibited greater LAVI (3.94 CI [0.8, 7.07]; p=0.000) and left ventricular mass index (11.10 CI [2.56,19.65]; p=0.000) as compared to control group. The presence of OSA was also associated with more prolonged DT (10.44 ms CI [0.71,20.16]; p=0.04), IVRT (7.85 ms CI[4.48, 11.22]; p=0.000), and lower E/A ratio (-0.62 CI [-1,-0.24]; p=0.001) suggestive of early DD. The E/e’ ratio (0.94 CI[0.44, 1.45]; p=0.000) was increased. Conclusion: An association between OSA and echocardiographic parameters of DD was detected that was independent of conventional cardiovascular risk factors. OSA may be independently associated with DD perhaps due to higher LV mass. Investigating the role of CPAP therapy in reversing or ameliorating diastolic dysfunction is recommended.

Hans Fangohr

and 2 more

Guest Editors’ IntroductionNotebook interfaces – documents combining executable code with output and notes – first became popular as part of computational mathematics software such as Mathematica and Maple. The Jupyter Notebook, which began as part of the IPython project in 2012, is an open source notebook that can be used with a wide range of general-purpose programming languages.Before notebooks, a scientist working with Python code, for instance, might have used a mixture of script files and code typed into an interactive shell. The shell is good for rapid experimentation, but the code and results are typically transient, and a linear record of everything that was tried would be long and not very clear. The notebook interface combines the convenience of the shell with some of the benefits of saving and editing code in a file, while also incorporating results, including rich output such as plots, in a document that can be shared with others.The Jupyter Notebook is used through a web browser. Although it is often run locally, on a desktop or a laptop, this design means that it can also be used remotely, so the computation occurs, and the notebook files are saved, on an institutional server, a high performance computing facility or in the cloud. This simplifies access to data and computational power, while also allowing researchers to work without installing any special software on their own computer: specialized research software environments can be provided on the server, and the researcher can access those with a standard web browser from their computer.These advantages have led to the rapid uptake of Jupyter notebooks in many kinds of research. The articles in this special issue highlight this breadth, with the authors representing various scientific fields. But more importantly, they describe different aspects of using notebooks in practice, in ways that are applicable beyond a single field.We open this special issue with an invited article by Brian Granger and Fernando Perez – two of the co-founders and leaders of Project Jupyter. Starting from the origins of the project, they introduce the main ideas behind Jupyter notebooks, and explore the question of why Jupyter notebooks have been so useful to such a wide range of users. They have three key messages. The first is that Notebooks are centered around the humans using them and building knowledge with them. Next, notebooks provide a write-eval-think loop that lets the user have a conversation with the computer and the system under study, which can be turned into a persistent narrative of computational exploration. The third idea is that Project Jupyter is more than software: it is a community that is nourished deliberately by its members and leaders.The following five articles in this special issue illustrate the key features of Project Jupyter effectively. They show us a small sample of where researchers can go when empowered by the tool, and represent a range of scientific domains.Stephanie Juneau et al. describe how Jupyter has been used to ‘bring the compute to the data’ in astrophysics, allowing geographically distributed teams to work efficiently on large datasets. Their platform is also used for education & training, including giving school students a realistic taste of modern science.Ryan Abernathey et al. , of the Pangeo project, present a similar scenario with a focus on data from the geosciences. They have enabled analysis of big datasets on public cloud platforms, facilitating a more widely accessible ‘pay as you go’ style of analysis without the high fixed costs of buying and setting up powerful computing and storage hardware. Their discussion of best practices includes details of the different data formats required for efficient access to data in cloud object stores rather than local filesystems.Marijan Beg et al. describe features of Jupyter notebooks and Project Jupyter that help scientists make their research reproducible. In particular, the work focuses on the use of computer simulation and mathematical experiments for research. The self-documenting qualities of the notebook—where the response to a code cell can be archived in the notebook—is an important aspect. The paper addresses wider questions, including use of legacy computational tools, exploitation of HPC resources, and creation of executable notebooks to accompany publications.Blaine Mooers describes the use of a snippet library in the context of molecular structure visualization. Using a Python interface, the PyMOL visualization application can be driven through commands to visualize molecular structures such as proteins and nucleic acids. By using those commands from the Jupyter notebook, a reproducible record of analysis and visualizations can be created. The paper focuses on making this process more user-friendly and efficient by developing a snippet library, which provides a wide selection of pre-composed and commonly used PyMOL commands, as a JupyterLab extension. These commands can be selected via hierarchical pull-down menus rather than having to be typed from memory. The article discusses the benefits of this approach more generally.Aaron Watters describes a widget that can display 3D objects using webGL, while the back-end processes the scene using a data visualization pipeline. In this case, the front-end takes advantage of the client GPU for visualization of the widget, while the back-end takes advantage of whatever computing resources are accessible to Python.The articles for this special issue were all invited submissions, in most cases from selected presentations given at JupyterCon in October 2020. Each article was reviewed by three independent reviewers. The guest editors are grateful to Ryan Abernathey, Luca de Alfaro, Hannah Bruce MacDonald, Christopher Cave-Ayland, Mike Croucher, Marco Della Vedova, Michael Donahue, Vidar Fauske, Jeremy Frey, Konrad Hinsen, Alistair Miles, Arik Mitschang, Blaine Mooers, Samual Munday, Chelsea Parlett, Prabhu Ramachandran, John Readey, Petr Škoda and James Tocknell for their work as reviewers, along with other reviewers who preferred not to be named. The article by Brian Granger and Fernando Perez was invited by the editor in chief, and reviewed by the editors of this special issue.Hans Fangohr is currently heading the Computational Science group at the Max Planck Institute for the Structure and Dynamics of Matter in Hamburg, Germany, and is a Professor of Computational Modelling at the University of Southampton, UK. A physicist by training, he received his PhD in Computer Science in 2002. He authored more than 150 scientific articles in computational science and materials modelling, several open source software projects, and a text book on Python for Computational Science and Engineering. Contact him at hans.fangohr@mpsd.mpg.deThomas Kluyver is currently a software engineer at European XFEL. Since gaining a PhD in plant sciences from the University of Sheffield in 2013, he has been involved in various parts of the open source & scientific computing ecosystems, including the Jupyter & IPython projects. Contact him at thomas.kluyver@xfel.euMassimo Di Pierro is a Professor of Computer Science at DePaul University. He has a PhD in Theoretical Physics from the University of Southampton and is an expert in Numerical Algorithms, High Performance Computing, and Machine Learning. Massimo is the lead developer of many open source projects including web2py, py4web, and pydal. He has authored more than 70 articles in Physics, Computer Science, and Finance and has published three books. Contact him at
Many societal opportunities and challenges, both current and future, are either inter- or transdisciplinary in nature. Focus and action to cut across traditional academic boundaries has increased in research and, to a less extent, teaching. One successful collaboration has been the augmentation of fields within the Humanities, Social Sciences, and Arts by integrating complementary tools and methods originated from STEM. This trend is gradually materializing in formal undergraduate and secondary education.The proven effectiveness of Jupyter notebooks for teaching and learning STEM practices gives rise to a nascent case for education seeking to replicate this interdisciplinary design to adopt notebook technology as the best pedagogical tool for this job. This article presents two sets of data to help argue this case.The first set of data demonstrates the art of the possible. A sample of undergraduate and secondary level courses showcases existing or recent work of educational stakeholders in the US and UK who are already pioneering instruction where computational and data practices are integrated into the study of the Humanities, Social Sciences, and Arts, with Jupyter notebooks chosen as a central pedagogical tool. Supplementary data providing an overview of the types of technical material covered by each course syllabi further evidences what interdisciplinary education is perceived to be or is already feasible using this Jupyter technology with student audiences of these levels.The second set of data provides more granular, concrete insight derived from user experiences of a handful of the courses from the sample. Four instructors and one student describe a range of pedagogical benefits and value they attribute to the use of Jupyter notebooks in their course(s).In presenting this nascent case, the article aims to stimulate the development of Jupyter notebook-enabled, computational data-driven interdisciplinary education within undergraduate and secondary school programs.
Many high-performance computing applications are of high consequence to society. Global climate modeling is a historic example of this. In 2020, the societal issue of greatest concern, the still-raging COVID-19 pandemic, saw a legion of computational scientists turning their endeavors to new research projects in this direction. Applications of such high consequence highlight the need for building trustworthy computational models. Emphasizing transparency and reproducibility has helped us build more trust in computational findings. In the context of supercomputing, however, we may ask: how do we trust results from computations that cannot be repeated? Access to supercomputers is limited, computing allocations are finite (and competitive), and machines are decommissioned after a few years. In this context, we might ask how reproducibility can be ensured, certified even, without exercising the original digital artifacts used to obtain new scientific results. This is often the situation in HPC. It is compounded now with greater adoption of machine learning techniques, which can be opaque. The ACM in 2017 issued a Statement on Algorithmic Transparency and Accountability, targeting algorithmic decision-making using data models \cite{council2017}. Among its seven principles, it calls for data provenance, auditability, validation and testing. These principles can be applied not only to data models, but to HPC in general. I want to discuss the next steps for reproducibility: how we may adapt our practice to achieve what I call unimpeachable provenance, and full auditability and accountability of scientific evidence produced via computation.An invited talk at SC20I was invited to speak at SC20 about my work and insights on transparency and reproducibility in the context of HPC. The session's theme was Responsible Application of HPC, and the title of my talk was "Trustworthy computational evidence through transparency and reproducibility." At the previous SC, I had the distinction to serve as Reproducibility Chair, leading an expansion of the initiative, which was placed under the Technical Program that year. We moved to make Artifact Description appendices required for all SC papers, created a template and an author kit for the preparation of the appendices, and introduced three new Technical Program tracks in support of the initiative. These are: the Artifact Description & Evaluation Appendices track—with an innovative double-open constructive review process—, the Reproducibility Challenge track, and the Journal Special Issue track, for managing the publication of select papers on the reproducibility benchmarks of the Student Cluster Competition. This year, the initiative was augmented to address issues of transparency, in addition to reproducibility, and a community sentiment study was launched to assess the impact of the effort, six-years in, and canvas the community's outlook on various aspects of it.Allow me to thank here Mike Heroux, Reproducibility Chair for SC in 2017 and 2018, Michela Taufer, SC19 General Chair—who put her trust in me to inherit the role from Mike—, and Beth Plale, the SC20 Transparency and Reproducibility Chair. I had countless inspiring and supportive conversations with Mike and Michela about the topic during the many months of planning for SC19, and more productive conversations with Beth during the transition to her leadership. Mike, Michela and I have served on other committees and working groups together, in particular, the group that met in July 2017 at the National Science Foundation (convened by Almadena Chtchelkanova) for the Workshop on Reproducibility Taxonomies for Computing and Computational Science. My presentation at that event condensed an inventory of uses of various terms like reproducibility and replication, across many fields of science \cite{barba2017}. I then wrote the review article "Terminologies for Reproducible Research," and posted it on arXiv \cite{barba2018}. It informed our workshop's report, which came out a few months later as a Sandia technical report \cite{taufer2018}. In it, we highlighted that the fields of computational and computing sciences provided two opposing definitions of the terms reproducible and replicable, representing an obstacle to progress in this sphere.The Association of Computing Machinery (ACM), representing computer science and industry professionals, had recently established a reproducibility initiative, and adopted diametrically opposite definitions to those used in computational sciences for more than two decades. In addition to raising awareness about the contradiction, we proposed a path to a compatible taxonomy. Compatibility is needed here because the computational sciences—astronomy, physics, epidemiology, biochemistry and others that use computing as a tool for discovery—and computing sciences (where algorithms, systems, software, and computers are the focus of study) have community overlap and often intersect in the venues of publication. The SC conference series is one example. Given the historical precedence and wider adoption of the definitions of reproducibility and replicability used in computational sciences, our Sandia report recommended that the ACM definitions be reversed. Several ACM-affiliated conferences were already using the artifact review and badging system (approved in 2016), so this was no modest suggestion. The report, however, was successful in raising awareness of the incompatible definitions, and the desirability of addressing it.A direct outcome of the Sandia report was a proposal to the National Information Standards Organization (NISO) for a Recommended Practice Toward a Compatible Taxonomy, Definitions, and Recognition Badging Scheme for Reproducibility in the Computational and Computing Sciences. NISO is accredited by the American National Standards Institute (ANSI) to develop, maintain, and publish consensus-based standards for information management. The organization has more than 70 members; publishers, information aggregators, libraries and other content providers use its standards. I co-chaired this particular working group, with Gerry Grenier from IEEE and Wayne Graves from ACM; Mike Heroux was also a member. The goal of the NISO Reproducibility Badging and Definitions Working group was to develop a Recommended Practice document—a step before development of a standard. As part of our joint work, we prepared a letter addressed to the ACM Publications Board, delivered in July 2019. It described the context and need for compatible reproducibility definitions and made the concrete request that ACM consider a change. By that time, not only did we have the Sandia report as justification, but the National Academies of Sciences, Engineering and Medicine (NASEM) had just released the report Reproducibility and Replicability in Science \cite{medicine2019}. It was the product of a long consensus study conducted by 15 experts, including myself, and sponsored by the National Science Foundation responding to Congressional decree. The NASEM report put forth its definitions as:Reproducibility is obtaining consistent results using the same input data, computational steps, methods and code, and conditions of analysis.Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.The key contradiction with the ACM badging system resides on which term comprises using the author-created digital artifacts (e.g., data and code). We stated in the NISO working-group letter that if the ACM definitions of reproducible and replicable could be interchanged, the working group could move forward towards its goal of drafting recommended practices for badging that would lead to wider adoption in other technical societies and publishers. The ACM Publications Board responded positively, and began working through the details on how to make changes to items already published in the Digital Library with the "Results Replicated" badge—about 188 items existed at that time that were affected. Over the Summer of 2020, the ACM applied changes to the published Artifact Review and Badging web pages, and added a version number. From version 1.0, we see a note added that, as a result of discussions with NISO, the ACM was harmonizing its terminologies with those used in the broader scientific research community.All this background serves to draw our attention to the prolonged, thoughtful, and sometimes arduous efforts that have been directed at charting paths for adoption and giving structure to reproducibility and replicability in our research communities. Let us move now to why and how might the HPC community move forward.Insights on transparent, reproducible HPC researchDeployed barely over a year ago, the NSF-funded Frontera system at the Texas Advanced Computing Center (TACC) came in as the 8th most powerful supercomputer in the world, and the fastest on a university campus. Up to 80% of the available time on the system is allocated through the NSF Petascale Computing Resource Allocation program. The latest round of Frontera allocations (as of this writing) was just announced on October 25, 2020. I read through the fact sheet on the 15 newly announced allocations, to get a sense for the types of projects in this portfolio. Four projects are machine-learning or AI-focused, the same number as those in astronomy and astrophysics, and one more than those in weather or climate modeling. Other projects are single instances spanning volcanology/mantle mechanics, molecular dynamics simulations of ion channels, quantum physics in materials science, and one engineering project in fluid-structure interactions. One could gather these HPC projects in four groups:Astronomy and astrophysics are mature fields that in general have high community expectations of openness and reproducibility. As I'll highlight below, however, even these communities with mature practices benefit from checks of reproducibility that uncover areas of improvement. The projects tackling weather and climate modeling are candidates for being considered of high consequence to society. One example from the Frontera allocations concerns the interaction of aerosols caused by industrial activity with clouds, which can end up composed of smaller droplets, and become more reflective, resulting in a cooling effect on climate. Global climate models tend to overestimate the radiative forcing, potentially underestimating global warming: why? This is a question of great consequence for science-informed policy, in a subject that is already under elevated scrutiny from the public. Another project in this cluster deals with real-time high-resolution ensemble forecasts of high-impact winter weather events. I submit that high standards of transparency, meticulous provenance capture, and investments of time and effort in reproducibility and quality assurance are justified in these projects. Four of the winning projects are applying techniques from machine learning to various areas of science. In one case, the researchers seek to bridge the gap in the trade-off between accuracy of prediction and model interpretability, to make ML more applicable in clinical and public health settings. This is clearly also an application of high consequence, but in addition all the projects in this subset face the particular transparency challenges of ML techniques, requiring new approaches to provenance capture and transparent reporting. The rest of the projects are classic high-performance computational science applications, such as materials science, geophysics, and fluid mechanics. Reproducible-research practices vary broadly in these settings, but I feel confident saying that all or nearly all those efforts would benefit from prospective data management, better software engineering, and more automated workflows. And their communities would grow stronger with more open sharing. The question I have is: how could the merit review of these projects nudge researchers towards greater transparency and reproducibility? Maybe that is a question for later, and a question to start with is how could support teams at cyberinfrastructure facilities work with researchers to facilitate their adoption of better practices in this vein? I'll revisit these questions later.I also looked at the 2019 Blue Waters Annual Report, released on September 15, 2020, with highlights from a multitude of research projects that benefitted from computing allocations on the system. Blue Waters went into full service in 2013 and has provided over 35 billion core-hour equivalents to researchers across the nation. The highlighted research projects fall into seven disciplinary categories, and include 32 projects in space science, 20 in geoscience, 45 in physics and engineering, and many more. I want to highlight just one out of the many dozens of projects featured in the Blue Waters Annual Report, for the following reason. I did a word search on the PDF with Zenodo, and that project was the only one listing Zenodo entries in the "Publications & Data Sets" section that ends each project feature. One other project (in the domain of astrophysics) mentions that data is available through the project website and in Zenodo, but doesn't list any data sets in the report. Zenodo is an open-access repository funded by the European Union's Framework Programs for Research, and operated by CERN. Some of the world’s top experts in running large-scale research data infrastructure are at CERN, and Zenodo is hosted on top of infrastructure built in service of what is the largest high-energy physics laboratory of the world. Zenodo hosts any kind of data, under any license type (including closed-access). It has become one of the most used archives for open sharing of research objects, including software.The project I want to highlight is "Molten-salt reactors and their fuel cycles," led by Prof. Kathryn Huff at UIUC. I've known Katy since 2014, and she and I share many perspectives on computational science, including a strong commitment to open-source software. This project deals with modeling and simulation of nuclear reactors and fuel cycles, combining multiple physics and multiple scales, with the goal of improving design of nuclear reactors in terms of performance and safety. As part of the research enabled by Blue Waters, the team developed two software packages: Moltres, described as a first-of-its-kind finite-element code for simulating the transient neutronics and thermal hydraulics in a liquid-fueled molten-salt reactor design; and SaltProc: a Python tool for fuel salt reprocessing simulation. The references listed in the project highlight include research articles in the Annals of Nuclear Energy, as well as the Zenodo deposits for both codes, and a publication about Moltres in the Journal of Open Source Software, JOSS. (As one of the founding editors of JOSS, I'm very pleased.) It is possible, of course, that other projects of the Blue Waters portfolio have also made software archives in Zenodo or published their software in JOSS, but they did not mention it in this report and did not cite the artifacts. Clearly, the research context of the project I highlighted is of high consequence: nuclear reactor design. The practices of this research group show a high standard of transparency that should be the norm in such fields. Beyond transparency, the publication of the software in JOSS ensures that it was subject to peer review and that it satisfies standards of quality. JOSS reviewers install the software, run tests, and comment on usability and documentation, leading to quality improvements.Next, I want to highlight the work of a group that includes CiSE editors Michela Taufer and Ewa Deelman, posted last month on arXiv \cite{e2020}[6]. The work sought to directly reproduce the analysis that led to the 2016 discovery of gravitational waves, using the data and codes that the LIGO collaboration had made available to the scientific community. The data had previously been re-analyzed by independent teams using different codes, leading to replication of the findings, but no attempt had yet been made at reproducing the original results. In this paper, the authors report on challenges they faced during the reproduction effort, even with availability of data and code supplementing the original publication. A first challenge was the lack of a single public repository with all the information needed to reproduce the result. The team had the cooperation of one of the original LIGO team members, who had access to unpublished notes that ended up being necessary in the process of iteratively filling in the gaps of missing public information. Other highlights of the reproduction exercise include: the original publication did not document the precise version of the code used in the analysis; the script used to make the final figure was not released publicly (but one co-author gave access to it privately); the original documented workflow queried proprietary servers to access data, which needed to be modified to run with the public data instead. In the end, the result—the statistical significance of the gravitational-wave detection from a black-hole merger—was reproduced, but not independently of the original team, as one researcher is co-author in both publications. The message here is that even a field that is mature in its standards of transparency and reproducibility needs checks to ensure that these practices are sufficient or can be improved.Science policy trendsThe National Academies study on Reproducibility and Replicability in Science was commissioned by the National Science Foundation under Congressional mandate, with the charge coming from the Chair of the Science, Space, and Technology Committee. NASEM reports and convening activities have a range of impacts on policy and practice, and often guide the direction of federal programs. NSF is in the process of developing its agency response to the report, and we can certainly expect to hear more in the future about requirements and guidance for researchers seeking funding.The recommendations in the NASEM report are directed at all the various stakeholders: researchers, journals and conferences, professional societies, academic institutions and national laboratories, and funding agencies. Recommendation 6-9, in particular, prompts funders to ask that grant applications discuss how they will assess and report uncertainties, and how the proposed work will address reproducibility and/or replicability issues. It also recommends that funders incorporate reproducibility and replicability in the merit-review criteria of grant proposals. Combined with related trends urging for more transparency and public access to the fruits of government-funded research, we need to be aware of the shifting science-policy environment.One more time, I have a reason to thank Mike Heroux, who took time for a video call with me as I prepared my SC20 invited talk. In his position as Senior Scientist at Sandia, 1/5 of his time is spent in service to the lab's activities, and this includes serving in the review committee of the internal Laboratory Directed Research & Development (LDRD) grants. As it is an internal program, the Calls for Proposals are not available publicly, but Mike told me that they now contain specific language asking proposers to include statements on how the project will address transparency and reproducibility. These aspects are discussed in the proposal review and are a factor in the decision-making. As community expectations grow, it could happen that between two proposals equally ranked in the science portion the tie-break comes from one of them better addressing reproducibility. Already some teams at Sandia are performing at a high level, e.g., they produce an Artifact Description appendix for every publication they submit, regardless of the conference or journal requirements.We don't know if or when NSF might add similar stipulations to general grant proposal guidelines, asking researchers to describe transparency and reproducibility in the project narrative. One place where we see the agency start responding to shifting expectations about open sharing of research objects is the section on results from prior funding. NSF currently requires here a listing of publications from prior awards, and "evidence of research products and their availability, including …data [and] software."I want to again thank Beth Plale, who took time to meet with me over video and sent me follow-up materials to use in preparing my SC20 talk. In March 2020, NSF issued a "Dear Colleague Letter" on Open Science for Research Data, with Beth then acting as the public access program director. The DCL says that NSF is expanding its Public Access Repository (NSF PAR) to accept metadata records, leading to data discovery and access. It requires research data to be deposited in an archival service and assigned a Digital Object Identifier (DOI), a global and persistent link to the object on the web. A grant proposal's Data Management Plan should state the anticipated archive to be used, and include any associated cost in the budget. Notice this line: "Data reporting will initially be voluntary." This implies that it will later be mandatory! The DCL invited proposals aimed at growing community readiness to advance open science. At the same time, the Office of Science and Technology Policy (OSTP) issued a Request for Information early this year asking what could Federal agencies do to make the results from research they fund publicly accessible. The OSTP sub-committee on open science is very active. An interesting and comprehensive response to the OSTP RFI comes from the MIT Libraries. It recommends (among other things): Policies that default to open sharing for data and code, with opt-out exceptions available [for special cases]… Providing incentives for sharing of data and code, including supporting credentialing and peer-review; and encouraging open licensing. Recognizing data and code as “legitimate, citable products of research” and providing incentives and support for systems of data sharing and citation… The MIT Libraries response addresses various other themes like responsible business models for open access journals, and federal support for vital infrastructure needed to make open access to research results more efficient and widespread. It also recommends that Federal agencies provide incentives for documenting and raising quality of data and code, and also "promote, support, and require effective data practices, such as persistent identifiers for data, and efficient means for creating auditable and machine readable data management plans."To boot, the National Institutes of Health (NIH) just announced on October 29 a new policy on data management and sharing. It requires researchers to plan prospectively for managing and sharing scientific data openly, saying: "we aim to shift the culture of research to make data sharing commonplace and unexceptional."Another setting where we could imagine expectations to discuss reproducibility and open research objects is proposals for allocation of computing time. For this section, I need to thank John West, Director Of Strategic Initiatives at the Texas Advanced Computing Center (and CiSE Associate EiC), who took time for a video call with me on this topic. We bounced ideas about how cyber-infrastructure providers might play a role in growing adoption of reproducibility practices. Currently, the NSF science proposal and the computing allocation proposal are awarded separately. The Allocation Submission Guidelines discuss review criteria, which include: intellectual merit (demonstrated by the NSF science award), methodology (models, software, analysis methods), research plan and resource request, and efficient use of the computational resources. For the most part, researchers have to show that their application scales to the size of the system they are requesting time on. Interestingly, the allocation award is not tied to performance, and researchers are not asked to show that their codes are optimized, only that they scale and that the research question is feasible to be answered in the allocated time. The responsible stewardship of the supercomputing system is provided for via a close collaboration between the researchers and the members of the supercomputing facility. Codes are instrumented under the hood with low-overhead collection of system-wide performance data (in the UT facility, with TACC-Stats) and a web interface for reports.I see three opportunities here: 1) workflow-management and/or system monitoring could be extended to also supply automated provenance capture; 2) the expert staff at the facility could broaden their support to researchers to include advice and training in transparency and reproducibility matters; and 3) cyber-infrastructure facilities could expand their training initiatives to include essential skills for reproducible research. John floated other ideas, like the possibility that some projects be offered a bump on their allocations (say, 5% or 10%) to engage in R&R activities; or, more drastic perhaps, that projects may not be awarded allocations over a certain threshold unless they show commitment and a level of maturity in reproducibility.Next steps for HPCThe SC Transparency and Reproducibility Initiative is one of the innovative, early efforts to gradually raise the expectations and educate a large community about how to address it and why it matters. Over six years, we have built community awareness, and buy-in. This year's community sentiment study shows frank progress: 90% of the respondents are aware of the issues around reproducibility, and only 15% thought the concerns are exaggerated. Importantly, researchers report that they are consulting the artifact appendices of technical papers, signaling impact. As a community, we are better prepared to adapt to raising expectations from funders, publishers, and readers.The pandemic crisis has unleashed a tide of actions to increase access and share results: the Covid-19 Open Research Dataset (CORD-19) is an example \cite{al2020}; the COVID-19 Molecular Structure and Therapeutics Hub at MolSSI is another. Facing a global challenge, we as a society are strengthened by facilitating immediate public access to data, code, and published results. This point has been made by many in recent months, but perhaps most eloquently by Rommie Amaro and Adrian Mulholland in their Community Letter Regarding Sharing Biomolecular Simulation Data for COVID-19—signed by more than a hundred researchers from around the world \cite{j2020}. It says: "There is an urgent need to share our methods, models, and results openly and quickly to test findings, ensure reproducibility, test significance, eliminate dead-ends, and accelerate discovery." Then it follows with several commitments: to making results available quickly via pre-prints; to make available input files, model-building and analysis scripts (e.g., Jupyter notebooks), and data necessary to reproduce the results; to use open data-sharing platforms to make available results as quickly as possible; to share algorithms and methods in order to accelerate reuse and innovation; and to apply permissive open-source licensing strategies. Interestingly, these commitments are reminiscent of the pledges I made in my Reproducibility PI Manifesto \cite{barba2012} eight years ago!One thing the pandemic instantly provided is a strong incentive to participate in open science and attend to reproducibility. The question is how much will newly adopted practices persist once the incentive of a world crisis is removed.I've examined here several issues of incentives for transparent and reproducible research. But social epistemologists of science know that so-called Mertonian norms (for sharing widely the results of research) are supported by both economic and ethical factors—incentives and norms—in close interrelation. Social norms require a predominant normative expectation (for example, sharing of food in a given situation and culture). In the case of open sharing of research results, those expectations are not prime, due to researchers' sensitivity to credit incentives. Heesen \cite{heesen2017} concludes: "Give sufficient credit for whatever one would like to see shared ... and scientists will indeed start sharing it."In HPC settings, where we can hardly ever reproduce results (due to machine access, cost, and effort), a vigorous alignment with the goals of transparency and reproducibility will develop a blend of incentives and norms, will consider especially the applications of high consequence to society, and will support researchers with infrastructure (human and cyber). Over time, we will arrive at a level of maturity to achieve the goal of trustworthy computational evidence, not by actually exercising the open research objects (artifacts) shared by authors (data and code), but by a research process that ensures unimpeachable provenance.

Hossein Firoozabadi

and 2 more

Bio-photovoltaic devices (BPVs) harness photosynthetic organisms to produce bioelectricity in an eco-friendly way. However, their low energy efficiency is still a challenge. A comprehension of metabolic constraints can result in finding strategies for efficiency enhancement. This study presents a systemic approach based on metabolic modeling to design a regulatory defined medium, reducing the intracellular constraints in bioelectricity generation of Synechocystis sp. PCC6803 through the cellular metabolism alteration. The approach identified key reactions that played a critical role in improving electricity generation in Synechocystis sp. PCC6803 by comparing multiple optimal solutions of minimal and maximal NADH generation using two criteria. Regulatory compounds, which controlled the enzyme activity of the key reactions, were obtained from the BRENDA database. The selected compounds were subsequently added to the culture media, and their effect on bioelectricity generation was experimentally assessed. The power density curves for different culture media showed the BPV fed by Synechocystis sp. PCC6803 suspension in BG-11 supplemented with NH4Cl achieved the maximum power density of 148.27 mW m-2. This produced power density was more than 40.5-fold of what was obtained for the BPV fed with cyanobacterial suspension in BG-11. The effect of the activators on BPV performance was also evaluated by comparing their overpotential, maximum produced power density, and biofilm morphology under different conditions. These findings demonstrated the crucial role of cellular metabolism in improving bioelectricity generation in BPVs.

Pritish Mondal

and 4 more

Rationale: Gas exchange abnormalities in Sickle Cell Disease (SCD) may represent cardiopulmonary deterioration. Identifying predictors of these abnormalities in children with SCD (C-SCD) may help us understand disease progression and develop informed management decisions. Objectives: To identify pulmonary function tests (PFT) and biomarkers of systemic disease severity that are associated with and predict abnormal carbon monoxide diffusing capacity (DLCO) in C-SCD. Methods: We obtained PFT data from 51 C-SCD (115 observations) and 22 controls, and identified predictors of DLCO for further analyses. We formulated a rank list of DLCO predictors based on machine learning algorithms (XGBoost) or linear mixed-effect models and compared estimated DLCO to the measured values. Finally, we evaluated the association between measured and estimated DLCO and clinical outcomes, including SCD crises, pulmonary hypertension, and nocturnal hypoxemia. Results: DLCO and several PFT indices were diminished in C-SCD compared to controls. Both statistical approaches ranked FVC%, neutrophils(%), and FEV25%-75% as the top three predictors of DLCO. XGBoost had superior performance compared to the linear model. Both measured and estimated DLCO demonstrated significant association with SCD severity indicators. DLCO estimated by XGBoost was associated with SCD crises (beta=-0.084 [95%CI -0.134, -0.033]) and with TRJV (beta=-0.009 [-0.017, -0.001]), but not with nocturnal hypoxia (p=0.121). Conclusions: In this cohort of C-CSD, DLCO was associated with PFT estimates representing restrictive lung disease (FVC%), airflow obstruction (FEV25%-75%), and inflammation (neutrophil%). We were able to use these indices to estimate DLCO, and show association with disease outcomes, underscoring the prediction models’ clinical relevance.

Browse more published preprints

How it works

Upload or create your research work
You can upload Word, PDF, LaTeX as well as data, code, Jupyter Notebooks, videos, and figures. Or start a document from scratch.
Disseminate your research rapidly
Post your work as a preprint. A Digital Object Identifier (DOI) makes your research citeable and discoverable immediately.
Get published in a refereed journal
Track the status of your paper as it goes through peer review. When published, it automatically links to the publisher version.
Learn More
Featured communities
Explore More Communities

Other benefits of Authorea


A repository for any field of research, from Anthropology to Zoology


Discuss your preprints with your collaborators and the scientific community

Interactive Figures

Not just PDFs. You can publish d3.js and graphs, data, code, Jupyter notebooks

Featured templates
Featured and interactive
Journals with direct submission
Explore All Templates