Public Articles

SMMP - Stochastic Methods for Molecular Properties

Roberto Di Remigio

Possible titles:

Stochastic Methods for Molecular Properties (SMMP)
Stochastic Methods for Chiroptical Properties (SMCP or ChiroStoch)
Deterministic methods need large Hilbert spaces for effective expansions of the many-electron wave function
This is however largely redundant \cite{Ivanic_2001}
Stochastic algorithms are highly parallelizable in the number of walkers.
I will develop my skills in parallel programming techniques by developing this project.

Research questions:

Understanding chiroptical properties for large chemical systems

Objectives of the project:

The calculation of molecular properties with high accuracy and for systems of relevant size
1. High accuracy means coupled cluster (CC) wave functions
2. Systems for which CC is an option are limited by its polynomial scaling
3. It is possible to reduce the scaling, e.g. by means of local approaches to the electron correlation problem,
  but this has been proven to not be as effective for molecular properties as for energies
Devise the appropriate stochastic approach to the solution of response equations
1. We want the stochastic approach because it's (supposedly) embarassingly parallel.
  This can enable the study of the response properties for larger systems and provide benchmark
  results for lower level calculations.
2. Low scaling (or better parallelization options) + (perhaps?) controllable error of stochastic methods
  are the key advantages. Find relevant literature on both!
  \cite{Booth_2014}, \cite{Coccia_2012}
The creation of the appropriate software toolbox with good scalability.
1. The toolbox will be freely available under the appropriate open source license (GPL most likely!)

General background on quantum chemistry:

"Frontiers in electronic structure theory" \cite{Sherrill_2010}
Quantum chemistry as an effective complement to experiment \cite{Lee_1995}, \cite{Helgaker_2004}, \cite{Tajti_2004}, \cite{Helgaker_2000}, \cite{Goddard_1985}

State of the art:

QMC:
- Recent reviews on QMC approaches: \cite{Dubeck__2016}, \cite{Toulouse_2016}, \cite{Austin_2012}, \cite{Needs_2009}, \cite{Assaraf_2007}, \cite{Towler_2006} and \cite{Foulkes_2001}
- DMC (Wick rotation of the Schrodinger equation \cite{Wick_1954} and isomorphism with a classical diffusion problem)
- Self-Healing QMC
- Auxiliary Field QMC
- Fermion QMC
- FCIQMC
- Stochastic Coupled Cluster: Thom introduces the FCIQMC-like stochastic algorithm for solving the CC equations \cite{Thom_2010}
  The methods leverages the stochastic sampling strategies of the FCI wave function in a discrete Fock space first proposed by Alavi _et al._\cite{Booth_2009}
- Linked Coupled Cluster Monte Carlo: the stochastic algorithm for the solution of the CC equations in the linked (term-by-term size-extensive) form \cite{Franklin_2016}
- Initiator approximation: the same group proposes the initiator approximation for the CCMC algorithm \cite{Spencer_2016}
- Stochastic Møller-Plesset \cite{Thom_2007}
- Local approaches to QMC \cite{Manten_2003}, \cite{Williamson_2001}
Properties by QMC
- Dynamic polarizabilities \cite{Caffarel_1993}, \cite{Mella_2001}
- Large systems \cite{Filippi_2012}, \cite{Valsson_2010}
- Forces: correlated sampling \cite{Filippi_2000} and space warp coordinate transformation \cite{Umrigar_2009}
  Work by Assaraf and Caffarel on improved estimators \cite{Assaraf_2000} and \cite{Assaraf_2003}
- Static electric properties (dipole and quadrupole moments, static polarizabilities): polarizabilities by finite differences for ethyne \cite{Coccia_2012}
  polarizability of the hydrogen atom by modified sampling \cite{Li_2007}
Chiroptical properties:
- Eliel "Stereochemistry of organic compounds" (find appropriate ref)
- Barron's book \cite{Barron_2004} Rosenfeld's formulation \cite{Rosenfeld_1929}
- OR and ECD review by Pecul and Ruud \cite{Pecul_2005}
- Berova's book \cite{Berova_2011}
- Daniel's reviews \cite{Crawford_2005}, \cite{Crawford_2007a}, \cite{Crawford_2012}
- Octant rule \cite{Snatzke_1979} and its failure \cite{Rinderspacher_2004}
- Chiroptical properties by DFT \cite{Cheeseman_2000}, \cite{Furche_2000}, \cite{Grimme_2001} \cite{Stephens_2001} \cite{Stephens_2002} \cite{Grimme_2002} \cite{Autschbach_2002}
  \cite{Autschbach_2002b} \cite{Autschbach_2003}
- Chiroptical properties by CC \cite{Tam_2004} \cite{Crawford_2005} \cite{Ruud_2002} \cite{Ruud_2003}
- Computational studies available on a variety of molecules \cite{Tam_2006} \cite{Kowalczyk_2006} \cite{Crawford_2005} \cite{Tam_2007} \cite{Crawford_2007a} \cite{Crawford_2007b}
  \cite{Wiberg_2008}\cite{Crawford_2008} \cite{Crawford_2009} \cite{Pedersen_2009} \cite{Pedersen_2009b} \cite{Lambert_2012} \cite{Rinderspacher_2004} \cite{Wilson_2005}
  \cite{Furche_2000} \cite{Pulm_1997} \cite{Kondru_1998} \cite{Grimme_1998} \cite{Polavarapu_1999} \cite{Ribe_2000} \cite{Polavarapu_2002} \cite{Diedrich_2003} \cite{Polavarapu_2003} \cite{Norman_2004} \cite{Diedrich2004-xo} \cite{Stephens_2005} \cite{Wiberg_2005a} \cite{Wiberg_2005b} \cite{Wiberg_2006} \cite{Autschbach_2009} \cite{Pritchard_2010} \cite{Mach_2011} \cite{Mach_2014}
- Ultrasensitive Cavity Ring-Down Polarimetry (CRDP) \cite{Wilson_2005} \cite{M_ller_2002} \cite{M_ller_2000}
- Local approaches to the correlation problem in response theory \cite{Russ_2004} \cite{Russ_2008}
Response theory:
- Most recent review on wave function-based response theory \cite{Helgaker_2012}
- Foundational work \cite{Olsen_1985}, \cite{Christiansen_1998}, \cite{Paw_owski_2015}, \cite{Coriani_2016}
- CC response theory
- Local approaches to CC (really a lot of literature...)
- Local approaches to CC response theory \cite{Friedrich_2015}, \cite{McAlexander_2012}, \cite{McAlexander_2016}

Problems to address:

The fermion sign problem. How do FCIQMC and CCMC avoid it?
The sign problem is NP-hard \cite{Troyer_2005} thus not solvable in polynomial time

TODO:

Size of the systems investigated by stochastic methods in Fock space?
Properties by stochastic methods? VMC/DMC/FCIQMC?
Local correlation approaches for molecular properties?
L. Guidoni might have published some FCIQMC calculations on large molecular systems.

King Chicken Theorem

Maritza

In graph theory, directed graphs can be used to understand tournaments and theorems such as the king chicken theorem. First to understand the king chicken theorem, we will go over some terminology. A tournament is a directed graph that contains edges that have specific orientation. Tournament graphs are also used to show relationships between players and who beat who in a tournament. Complete graphs most often show this by using arrows. Any edge that points from $i$ to $j$ has directed orientation.

Detailed Reviewer Responses

Anisha Keshavan

and 3 collaborators

We would like to thank the reviewers for their insightful comments. The major points that have been addressed are as follows:

It was not our intention to give the impression that one needs to scan human calibration phantoms at each site to properly power a multisite study with nonstandardized parameters, which is very costly. The statistical model which takes MRI bias into account has been emphasized instead. The bias that was measured and validated via calibration served to corroborate the scaling assumption of the statistical model. For other researchers planning multisite studies, the statistical model we proposed with the biases we reported should help plan and power a study.
Our measurements have been compared with other harmonization efforts, specifically \cite{cannon2014, jovicich2013brain} and \cite{Schnack_2004}.
The scanning parameters of our consortium have been better specified.
The independence assumption between the unobserved effect and the scaling factor for a particular site have been addressed. Specifically, we emphasized that this assumption could hold for MS patients based on our experiment. The need to validate this assumption for other situations by scanning human phantoms was recommended, and the equation of variance without the independence assumption has been provided for the readers.

Blog Post 10

Rachael Sharp

1 $\underline{\text{Trees-A Branch of Discrete Mathematics}}$ Trees provide poets with inspiration as they sway through the breeze and their leaves, bursting with color, rustle in the wind. It is no wonder, then, that mathematicians coined the term “tree” in describing special classes of structured graphs. One author, Joe Malkevitch, makes it his goal to “convince you (readers) that mathematical trees are no less lovely than their biological counterparts.”

In discrete mathematics, and more specifically graph theory, a tree is a connected graph with no cycles. When the graph is not connected, naturally we call this a forest. In addition, a vertex of degree 1 is called a leaf. These kind of mathematical structures were first studied by mathematician Arthur Cayley. In 1889 Cayley published a formula stating that for n ≥ 1, the number of trees with n vertices is n^n − 2.

A few other properties of trees include the following:

Given two vertices, x and y, there is a unique path from x to y
If we remove any edge of a tree, the graph is no longer connected
If a tree has n vertices, then it has n − 1 edges

The concept of mathematical trees has applications in various fields including science, the enumeration of saturated hydrocarbons, the study of electrical circuits, and many more (Harary, 1994, p. 4).

Designing Efficient Route-Stops using Agent-Based Simulation

Theresa Mendoza

and 2 collaborators

Abstract

Functional consequence of SNPs on the Tuberculosis drug metabolising enzyme, human arylamine N-acetyltransferase 1

Alan Christoffels

Abstract

Background

The human arylamine N-acetyltransferase 1 (NAT1) plays a vital role in determining the duration of action and pharmacokinetics of amine-containing drugs such as para-aminosalicylic acid used in clinical therapy, as well as influencing the balance between detoxification and metabolic activation of these drugs. Single nucleotide polymorphisms (SNPs) in this enzyme are continuously being detected and show inter-ethnic and inter-individual variation. Administrating tuberculosis (TB) treatment in the absence of genotypic information for drug metabolizing enzymes can limit the successful eradication of the disease from a patient. Recent studies have shown that loss of H-bonds affects protein function.

Results: In this study, the eects of 11 novel non-synonymous SNPs (nsSNPs) on the structure and function of NAT1 was tested computationally using SIFT and POLYPHEN-2 algorithms and structural analyses methods including loss of hydrogen-bonding, stability calculation, solvent accessibility and sequence conservation. Four out of 11 nsSNPs (Q210P, D229H, V231G and V235A) were predicted to aect protein function using both algorithms. Two of these four SNPs showed a loss of 2-4 hydrogen bonds and in most cases a destabilized protein structure. Another two SNPs (F202V, N245I) were predicted to aect protein function using both algorithms but without any loss of hydrogen-bonds. Three additional nsSNPs (T240S, S259R, T193S) were predicted to be benign with either a loss of three hydrogen bonds or no loss of hydrogen-bonds. The remaining two nsSNPs (E264K and R242M) showed conflicting results between SIFT and POLYPHEN-2 and both cases showed stable Gibbs free energy. No correlation could be identified between the predicted functional eects from SIFT and POLYPHEN-2, and the stability calculations and the hydrogen-bonding analyses. However, the structural effects of modifying an amino acid together with the conficting results from both algorithms warrant experimental testing to resolve the consequences of these 11 novel nsSNPs on NAT1.

Conclusion: The nsSNPs that aect protein function and/or have a destabilized structure provides a prioritized list of SNPs that will be tested in the laboratory by creating a SNP construct that will be cloned into an expression vector. These ndings will inform a strategy of incorporating genotypic data (i.e, functional SNP alleles) with phenotypic information (slow or fast acetylators) to better prescribe effective tuberculosis treatment.

Tournament Graphs

Rikki

Tournament graphs are used in discrete mathematics to represent a winning vertex in a graph. A tournament is a complete graph in which every pair of vertices are connected by a directed edge. These types of graphs are referred to as tournaments because each of the $n$ players competes against the other $n-1$ players where ties are not allowed and the winner can be represented on a graph. These graphs are created by assigning every player a vertex and if player $1$ beats player $2$, then a directed edge can be drawn with the arrow pointing from $1$ to $2.$ Tournaments graphs create Hamiltonian paths that go through each vertex. The Hamiltonian path theorem states that for every tournament there is a Hamiltonian path for $n\ge1,$ for any tournament consisting of $n$ vertices in which there is always a sequence of vertices $v_1,v_2,...,v_n$ such that $v_1\rightarrow v_2\rightarrow...\rightarrow v_n.$

Autopledge

Bacon

and 2 collaborators

Even in following good coding practices, arbitrary code execution bugs can still exist. By leveraging pledge(2) system calls and a static analysis framework, we attempt to mitigate these bugs by automatically inserting pledge statements. Although an algorithm was devised to do this, time limitations prevented its full implementation.

Welcome to Authorea!

Antonio

Double-click this text to start writing.

Menjurje Aminoácido

Jeff Bouquet

11:45 Pm

Supervised Learning: Classification and Regression

Naets

and 2 collaborators

Regression

\label{RegSection}

Contactless Remote Induction of Shear Waves in Soft Tissues Using a Transcranial Magnetic Stimulation Device

Pol Grasland Mongrain

This study presents the first observation of shear wave induced remotely within soft tissues. It was performed through the combination of a transcranial magnetic stimulation device and a permanent magnet. A physical model based on Maxwell and Navier equations was developed. Experiments were performed on a cryogel phantom and a chicken breast sample. Using an ultrafast ultrasound scanner, shear waves of respective amplitude of 5 and 0.5 micrometers were observed. Experimental and numerical results were in good agreement. This study constitutes the framework of an alternative shear wave elastography method.

Direct measurement of $\alpha_{\rm QED}(m_{\rm Z}^{2})$ at the FCC-ee

Patrick Janot

and 7 collaborators

When the measurements from the FCC-ee become available, an improved determination of the standard-model “input” parameters will be needed to fully exploit the new precision data towards either constraining or fitting the parameters of beyond-the-standard-model theories. Among these input parameters is the electromagnetic coupling constant estimated at the Z mass scale, $\alpha_{\rm QED}(m^2_{\rm Z})$. The measurement of the muon forward-backward asymmetry at the FCC-ee, just below and just above the Z pole, can be used to make a direct determination of $\alpha_{\rm QED}(m^2_{\rm Z})$ with an accuracy deemed adequate for an optimal use of the FCC-ee precision data.

MATH stuff

Namgyun Lee

Complex derivative

Here we provide a definition for the ’complex’ derivative of a real-valued function f : ℂⁿ → ℝ with respect to its complex variables. The notation f : ℂⁿ → ℝ means “f is a mapping (or function) from the set of column vectors of size n with complex components (denoted ℂⁿ) into the set of real numbers (denoted ℝ).”

The complex derivative of x = a + jb ∈ ℂ, a, b ∈ ℝ, is defined as \begin{equation} \frac{dx}{dx} = \frac{dx}{da} + j\frac{dx}{db}. \end{equation}

Example 1.

Given x = a + jb ∈ ℂ, a, b ∈ ℝ, What is D|x|?

Solution:
We have \begin{equation} |x| = \sqrt{x^*x} = \sqrt{(a-jb)(a+jb)} = \sqrt{a^2 + b^2}. \nonumber \end{equation} Applying the definition of the complex derivative yields \begin{eqnarray} \frac{d|x|}{dx} &=& \frac{d|x|}{da} + j\frac{d|x|}{db} \nonumber\\ &=& \frac{2a}{2\sqrt{a^2 + b^2}} + j\frac{2b}{2\sqrt{a^2 + b^2}} \nonumber\\ &=& \frac{a}{\sqrt{a^2 + b^2}} + j\frac{b}{\sqrt{a^2 + b^2}} \nonumber\\ &=& \frac{x}{|x|}. \nonumber \end{eqnarray}

Example 2.

Given x = a + jb ∈ ℂ, a, b ∈ ℝ, What is D|x|²?

Solution:
We have \begin{equation} |x|^2 = x^*x = (a-jb)(a+jb) = a^2 + b^2. \nonumber \end{equation} Applying the definition of the complex derivative yields \begin{eqnarray} \frac{d|x|^2}{dx} &=& \frac{d|x|^2}{da} + j\frac{d|x|^2}{db} \nonumber\\ &=& 2a + j2b \nonumber\\ &=& 2x. \nonumber \end{eqnarray} Suppose f : ℂⁿ → ℝ is a real-valued function and $x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$. The derivative Df(x) is a 1 × n matrix (a row vector), defined by \begin{equation} \label{eqn:derivative} Df(x) = \left[ \frac{\partial f}{\partial x_1}(x), \dots, \frac{\partial f}{\partial x_n}(x) \right]. \end{equation}

Example 3.

Given x = [x₁, …, x_n]^T ∈ ℂⁿ with x_i = a_i + jb_i ∈ ℂ, a_i, b_i ∈ ℝ, What is D∥x∥_ℓ₂²?

Solution:
We have \begin{eqnarray} \|x\|_{\ell_2}^2 &=& \sum_{i=1}^n |x_i|^2 = \sum_{i=1}^n x_i^*x_i \nonumber\\ &=& \sum_{i=1}^n (a_i +jb_i)^*(a_i +jb_i) \nonumber\\ &=& \sum_{i=1}^n (a_i -jb_i)(a_i +jb_i) \nonumber\\ &=& \sum_{i=1}^n (a_i^2 +b_i^2). \nonumber \end{eqnarray} We first look at the first element of Equation [eqn:derivative] with f(x)=∥x∥_ℓ₂². Applying the definition of the complex derivative gives \begin{eqnarray} \frac{\partial \|x\|_{\ell_2}^2}{\partial x_1} &=& \frac{\partial \|x\|_{\ell_2}^2}{\partial a_1} + j\frac{\partial \|x\|_{\ell_2}^2}{\partial b_1} \nonumber\\ &=& \frac{\partial }{\partial a_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) + j\frac{\partial }{\partial b_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) \nonumber\\ &=& 2a_1 + j2b_1 \nonumber\\ &=& 2x_1. \nonumber \end{eqnarray} Therefore, it follows that \begin{eqnarray} Df(x) &=& \left[ \frac{\partial \|x\|_{\ell_2}^2}{\partial x_1}, \dots, \frac{\partial \|x\|_{\ell_2}^2}{\partial x_n} \right] \nonumber\\ &=& \left[2x_1, \ldots, 2x_n \right] \nonumber\\ &=& 2x^T. \nonumber \end{eqnarray}

Example 4.

Suppose A ∈ ℂ^m × n, and x = [x₁, …, x_n]^T ∈ ℂⁿ with x_i = a_i + jb_i ∈ ℂ, a_i, b_i ∈ ℝ. What is D(Ax)?

Solution:
Since f(x)=Ax : ℂⁿ → ℂ^m, we have \begin{equation} D(Ax) = \left[ \frac{\partial (Ax)}{\partial x_1}, \dots, \frac{\partial (Ax)}{\partial x_n} \right]. \nonumber \end{equation} Since Ax ∈ ℂ^m, we express it as \begin{equation} Ax = \left[ \begin{array}{c} (Ax)_1 \\ \vdots \\ (Ax)_m \end{array} \right] = \left[ \begin{array}{c} \sum_{i=1}^n A_{1i}x_i \\ \vdots \\ \sum_{i=1}^n A_{mi}x_i \end{array} \right], \nonumber \end{equation} and it follows that \begin{equation} \frac{\partial (Ax)}{\partial x_1} = \left[ \begin{array}{c} \frac{\partial (Ax)_1}{\partial x_1} \\ \vdots \\ \frac{\partial (Ax)_m}{\partial x_1} \end{array} \right] = \left[ \begin{array}{c} A_{11} \\ \vdots \\ A_{m1} \end{array} \right]. \nonumber \end{equation} Using the expression above, we write the derivative of Ax as \begin{equation} D(Ax) = \left[ \begin{array}{ccc} \frac{\partial (Ax)_1}{\partial x_1} & \cdots & \frac{\partial (Ax)_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial (Ax)_m}{\partial x_1} & \cdots & \frac{\partial (Ax)_m}{\partial x_n} \end{array} \right] = \left[ \begin{array}{ccc} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mn} \end{array} \right] = A. \nonumber \end{equation}

Blog Post 9

Rachael Sharp

1 $\underline{\text{An Introduction to Graph Theory}}$ The concepts of graph theory go all the way back to the eighteenth century when, in 1736, Euler published what is believed to be the first paper on the very subject. It contained a famous problem known as the Königsberg Bridges Problem. This is a puzzle which considers the question of whether or not a person, starting from their home, could pass over each of the seven bridges that crossed the Pregal River exactly once before returning home. Euler was able to reconstruct the problem in such a way that allowed for him to lay the foundation of graph theory. To solve the puzzle, Euler replaced the land masses with vertices and let edges represent the bridges. The mathematical structure that became of his work is now called a graph.

Adjacency Matrix

Rikki

The adjacency matrix is used in discrete mathematics to represent the number of ways in which we can walk from one vertex to another within a graph. Any graph can be shown in an adjacency matrix where both the rows and columns are labeled with our graph vertices. We denote each entry as $\left(i,j\right)$ which counts the number of adjacent edges between the $i^{th}$ row and $j^{th}$ column. We also say that $a_{i,j}$ represents the number in row $i$, column $j.$ The adjacency matrix is made up of graph vertices that are either a $0$ or $1.$ To decide which entry to write in the matrix, we use a $0$ if vertex $i$ is not adjacent to vertex $j$ and we use a $1$ if vertex $i$ is adjacent to vertex $j.$

Matrices in Graph Theory

Maritza

Some of the most important matrices that are used in number theory are known as the adjacency matrix and the transition matrix. An adjacency matrix is given by the vertices of that matrix and is labeled with a $0$ or $1$ depending on its adjacency. The way we label such a vertex with its adjacency is by $\left(i,j\right)$, where $i$ is the row while $j$ is the column. Adjacency matrices can also be used to find the number of walks between vertices. To show this we raise our matrix to the $L$, where $L$ is the length of the walk and read off the matrix as $\left(i,j\right)$.

Assignment 2

Xavier Holt

Fat Points

Proof by contradiction. Assume that the line segment between points $P,Q$ has maximum pairwise distance, and that $Q$ does not lie on a vertex. Let the point on the boundary of our hull found by extending the line $PQ$ be denoted $Q'$. This boundary segment is defined between two vertices on our convex hull which we refer to as $A$ and $B$. See $\mathbf{Fig. 1}$ for clarification.

Q Lies on the Interior of the Hull

Clearly $|PQ'|>|PQ|$. In the following section we show that there is always a line-segment longer than $|PQ'|$. By the transitive property of inequality this segment must also be longer than $PQ$, a contradiction.

Q Lies on a Hull Edge

Welcome to Authorea!

Preeti

Hey, welcome. Double click anywhere on the text to start writing. In addition to simple text you can also add text formatted in boldface, italic, and yes, math too: E = mc²! Add images by drag’n’drop or click on the “Insert Figure” button.

Blog Post 8

Rachael Sharp

1 $\underline{\text{Parity}}$

Parity, in terms of mathematics, describes the classification of an integer as either even or odd. An even number is defined as an integer that is divisible by 2 while an odd number is one that is not. A more formal definition states that an even number is an integer n of the form n = 2k where k is an integer. On the other hand, an odd number is an integer of the form n = 2k + 1. In set notation we see:

\[ \text{Even} \hspace{0.5mm} = \hspace{0.5mm} {2k : k \in \mathbb{Z}} \]

\[ \text{Odd} \hspace{0.5mm} = \hspace{0.5mm} {2k+1 : k \in \mathbb{Z}} \]

In number theory, the idea of parity allows us to solve some mathematical problems simply by making note of odd and even numbers. In the same way, the impossibility of some mathematical constructions can be proven. For example, consider the following question:

The impact of boreal wildfires on carbon and nitrogen dynamics: the interplay between biotic and abiotic processes

Gustaf Granath

Purpose and aims

Wildfires are a natural phenomenon but human activities are altering both the driving factors (climate) and the vulnerability (land-use factors) of ecosystems, increasing both frequency and severity of fire impacts. This is an issue of concern given that wildfires play a major role in the global carbon cycle by affecting carbon and nitrogen storage in ecosystems. Yet, our knowledge of early post-fire carbon (C) and nitrogen (N) (hereafter abbreviated as CN) dynamics has been severely limited by the lack of cross-scale (from soil to plant to ecosystem) and cross-landscape (wetlands to uplands, managed and unmanaged land) studies. Understanding the mechanisms causing variability in CN dynamics (e.g., CN accumulation) , in heterogeneous landscapes, is critical for predicting changes in C and N storage with more frequent disturbance. Given this immediate research need, I propose an ambitious research program to investigate the impact of wildfires on the C and N cycle in the boreal landscape, capitalizing on a recent stand-replacing wildfire in Sweden. With an array of paired pre- and post-fire data, which is rare in wildfire ecosystem research, I aim to address whether pre-disturbance and initial post-disturbance conditions can be used to formulate predictions of post-disturbance ecosystem development. I will employ a novel multidiciplinary framework, which integrates ecological process, like plant community development, into the biogeochemical processes. This much needed integration makes it possible to improve and add new mechanisms to current ecosystem models and to answer under what conditions is the system is most vulnerable to change under frequent and severe wildfires. Three question-based work packages are described below as the basis of this wildfire research program:

CN losses. CN losses. Where in the landscape do the largest C and N losses occur, and what factors control losses? How large are CN combustion losses relative to C transformed into charcoal and hydrologically-exported CN following fire?
CN pool development. What is the relative importance of abiotic (e.g. soil moisture, temperature) and biotic (e.g. plant traits) factors in generating variation in post-fire recovery rate of C and N pools at different spatial scales?
Vegetation development. What controls species and trait assembly post-fire? What is the role of niche-based processes (abiotic effects: environmental filtering, and biotic effects: legacy effects, regeneration traits) in contrast to neutral processes (stochasticity, priority effects)?

HST proposal 2016

Melanie Galloway

and 6 collaborators

Scientific Justification

The ‘Scientific Justification’ section of the proposal (see Section 9.1) should include a description of the scientific investigations that will be enabled by the final data products, and their importance

6 page limit, total proposal + figures can be 11.

One of the most powerful observational tools for constraining the physics governing galaxy formation and evolution is morphology. The structural features of a galaxy are known to have close relationships with its physical properties; eg. the link between star formation rate and Hubble type \citep{Masters2010,Bundy2010,Schawinski2014} or spiral arms \citep{Willett2015}, bars and AGN \citep{Oh2012,Hao09,Galloway2015}, bars and atomic gas content \citep{Masters2012}, [lots more possibilities of examples - help with more non-galaxy zoo examples?] It is known that the demographics of most morphological features are not, in general, constant as a function of redshift. This is not surprising, given that key elements involved in the formation of galaxies are also shown to change as the Universe evolves, eg. star formation is known to peak at z ∼ 1 and drop steadily thereafter.

[few paragraphs of more descriptive examples of how galaxy physics is related to morphology + reasons for studying 0 < z < 2)]

Obtaining morphological data for such large numbers of galaxies is a unique challenge, in that to date there is no system that can produce both accurate and complete morphologies using automated methods. This problem is especially present with increasing redshift, for two reasons. First, images of distant galaxies are less resolved, making it difficult to distinguish finer features in the image. Second, galaxy shapes become increasingly irregular in the early Universe, due to increased merger rate and the clumpy nature of star formation. As large telescopes become more capable of imaging these distant galaxies, we continue to discover for the first time new large-scale structures which do not exist at low z; this creates a difficulty in defining an automated categorization for these unique types. Until automated methods overcome these challenges, visual classification by humans remains the most accurate method of measuring galaxy morphology, especially for galaxies beyond the local Universe.

Visual classification is of course not without its own challenges, which are time and efficiency. While humans produce more accurate and complete classifications than a computer, the time it takes to do so is overwhelming for the wealth of data becoming available by large surveys. The Galaxy Zoo project has developed a highly innovative method for bypassing the time drawback while maintaining the accuracy of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org asks people to classify the images by eye. Within its first year, each of the ∼1 million SDSS galaxies had already been classified an average of 40 times through the efforts of hundreds of thousands of members of the general public providing ∼40 million classifications \citep{Lintott2008,Fortson2012}.

In 2010, Galaxy Zoo moved beyond the local Universe by including ∼100, 000 HST galaxies in a project known as Galaxy Zoo: Hubble. All galaxies were classified at least 40 times by late 2012. This project enabled the first direct, morphologically accurate studies to be done on the evolution of galaxies, several of which have already been completed with the preliminary data, including bar fraction with redshift \citep{Cheung2014,Melvin2014} and passive disk fraction with redshift \citep{Galloway2016}. These only represent a small fraction of the numerous possibilities for scientific investigation capable with these data; disk/spheroidal distinction, bars, spiral arms, clumpiness, and bulge dominance are a portion of the morphological information provided by this catalog (for the full list see Figure [fig:decision tree]).

Our aim with this proposal is to develop the next phase of Galaxy Zoo:Hubble, which we will hereafter refer to as Galaxy Zoo:Hubble 2 (GZH2). The motivation for extending this project is twofold: First, although the visual classification methods have been immensely successful thus far in obtaining robust morphologies for large ( 100, 000) samples of galaxies, automation methods have improved since the first release in the form of powerful machine-learning algorithms. These alone are still not independently capable of accurate classification for galaxies at all redshifts, however combining these methods with the current system of human classifications has been shown to reduce the classification time of galaxies by 80% (can we cite something/ provide a figure Melanie?), thereby significantly improving both the efficiency and accuracy of GZH classifications. The details for this process are explained in full in the Analysis Plan. Second, in addition to the original GZH galaxies, an additional XX,XXX HST galaxies will be added to the project to be classified by this new method.

By combining machine-learning with human classifications, GZH2 will provide the most morphologically accurate data for the widest redshift range (to z ∼ 1.2) currently available. These data will enable countless new science projects involving galaxy evolution than has ever been capable to this level of accuracy. With the funding from this proposal, our team will focus on two science cases: clumpy galaxies (need better zinger description) and the mass-metallicity relation.

Homework 3

Antti Rantala

and 3 collaborators

To construct the L2 MNE operator, we used the formula found in the lecture slides: $$P_{MNE}=RL^T(LRL^T + \lambda C_n)^{-1}$$

The following Matlab script was used in the simple case without depth weighting:

load ex3_data.mat

%% MNE 
L2=L*L';
snr = sqrt(n_trial);
lambda = trace(L2)/(trace(Cn)*snr^2);
ev = eigs(L2+lambda*Cn, rank(data1));
tol = ev(end)
p_mne=L'*pinv(L2+lambda*Cn, tol);

src1 = p_mne*data1;
src2 = p_mne*data2;

Here an identity matrix was used for the source covariance matrix.

Next, to include depth weighting, we modified the source covariance matrix:

%% Depth-Weighted MNE
W_i = diag(sqrt(sum(L,1).^2./306));
lambda = trace(L*W_i*L')/(trace(Cn)*snr^2);
ev = eigs(L*W_i*L'+lambda*Cn, rank(data1));
tol = ev(end)
pw_mne=W_i*L'*pinv(L*W_i*L'+lambda*Cn,tol);

srcw1 = pw_mne*data1; 
srcw2 = pw_mne*data2;

Finally, we constructed a beamformer operator for the data:

%% Beamformer
N = 306;
M = 5124;
ev = eigs(Cd, rank(data1));
tol = ev(end)
p_bf = (pinv(Cd,tol)*L)';
denominator = diag(p_bf*L);
p_bf = p_bf./repmat(denominator,1, N);

srcb1 = p_bf*data1; 
srcb2 = p_bf*data2;

using the following formula from the lecture slides $$P_{BF,\theta} = \dfrac{(C_\mathrm{d}^{-1} L_\theta)^T}{L_\theta^T C_\mathrm{d}^{-1} L_\theta},$$ where L_θ is the gain vector for source point θ.

We wanted to visualize the source estimates as a function of time. We achieved this with the following script:

%% Plots
close all
vis_surface_data(srcb1(:,115), 0.1, max(srcb1(:)), anat_decim)

%%
close all
figure(3)
hold on
vv = var(data1,[], 2);
plot(timeaxis,data1(vv > 0.3*max(vv),:))
plot([1,1]*timeaxis(115),ylim(gca()), 'k--')
plot([1,1]*timeaxis(140),ylim(gca()), 'k--')
plot([1,1]*timeaxis(190),ylim(gca()), 'k--')
xlabel('time [s]')
axis tight

%%
close all
figure(4)

hold on
vv = var(data2,[], 2);
plot(timeaxis,data2(vv > 0.3*max(vv),:))
plot([1,1]*timeaxis(125),ylim(gca()), 'k--')
plot([1,1]*timeaxis(190),ylim(gca()), 'k--')
xlabel('time [s]')
axis tight

Key Cryptography

Maritza

RSA encryption can be used for many things such as keeping important messages secured. It is very difficult to break or decode messages that have been encrypted by RSA encryption if not given a public key. There are a few steps that one must go through in order to encrypt and decrypt a message.

We can look at few variables that are needed through the RSA encryption process:

$e$ will be our public key

$d$ is the value used for decoding and is only given to the receiver

$p$ and $q$ are the primes

$n$ is the result of $pq$

$M$ is the original message

$C=M^e$ (mod $n$) is used to encrypt messages

$C^d$ (mod $n$) is used to decrypt messages