Public Articles
SMMP - Stochastic Methods for Molecular Properties
Possible titles:
Stochastic Methods for Molecular Properties (SMMP)
Stochastic Methods for Chiroptical Properties (SMCP or ChiroStoch)
Deterministic methods need large Hilbert spaces for effective expansions of the many-electron wave function
This is however largely redundant \cite{Ivanic_2001}
Stochastic algorithms are highly parallelizable in the number of walkers.
I will develop my skills in parallel programming techniques by developing this project.
Research questions:
Objectives of the project:
The calculation of molecular properties with high accuracy and for systems of relevant size
Devise the appropriate stochastic approach to the solution of response equations
The creation of the appropriate software toolbox with good scalability.
General background on quantum chemistry:
State of the art:
QMC:
Properties by QMC
Chiroptical properties:
Response theory:
Problems to address:
TODO:
King Chicken Theorem
Detailed Reviewer Responses
and 3 collaborators
We would like to thank the reviewers for their insightful comments. The major points that have been addressed are as follows:
It was not our intention to give the impression that one needs to scan human calibration phantoms at each site to properly power a multisite study with nonstandardized parameters, which is very costly. The statistical model which takes MRI bias into account has been emphasized instead. The bias that was measured and validated via calibration served to corroborate the scaling assumption of the statistical model. For other researchers planning multisite studies, the statistical model we proposed with the biases we reported should help plan and power a study.
Our measurements have been compared with other harmonization efforts, specifically \cite{cannon2014, jovicich2013brain} and \cite{Schnack_2004}.
The scanning parameters of our consortium have been better specified.
The independence assumption between the unobserved effect and the scaling factor for a particular site have been addressed. Specifically, we emphasized that this assumption could hold for MS patients based on our experiment. The need to validate this assumption for other situations by scanning human phantoms was recommended, and the equation of variance without the independence assumption has been provided for the readers.
Blog Post 10
1 $\underline{\text{Trees-A Branch of Discrete Mathematics}}$ Trees provide poets with inspiration as they sway through the breeze and their leaves, bursting with color, rustle in the wind. It is no wonder, then, that mathematicians coined the term “tree” in describing special classes of structured graphs. One author, Joe Malkevitch, makes it his goal to “convince you (readers) that mathematical trees are no less lovely than their biological counterparts.”
In discrete mathematics, and more specifically graph theory, a tree is a connected graph with no cycles. When the graph is not connected, naturally we call this a forest. In addition, a vertex of degree 1 is called a leaf. These kind of mathematical structures were first studied by mathematician Arthur Cayley. In 1889 Cayley published a formula stating that for n ≥ 1, the number of trees with n vertices is nn − 2.
A few other properties of trees include the following:
Given two vertices, x and y, there is a unique path from x to y
If we remove any edge of a tree, the graph is no longer connected
If a tree has n vertices, then it has n − 1 edges
The concept of mathematical trees has applications in various fields including science, the enumeration of saturated hydrocarbons, the study of electrical circuits, and many more (Harary, 1994, p. 4).
Functional consequence of SNPs on the Tuberculosis drug metabolising enzyme, human arylamine N-acetyltransferase 1
Tournament Graphs
Autopledge
and 2 collaborators
Even in following good coding practices, arbitrary code execution bugs can still exist. By leveraging pledge(2) system calls and a static analysis framework, we attempt to mitigate these bugs by automatically inserting pledge statements. Although an algorithm was devised to do this, time limitations prevented its full implementation.
Supervised Learning: Classification and Regression
and 2 collaborators
\label{RegSection}
Contactless Remote Induction of Shear Waves in Soft Tissues Using a Transcranial Magnetic Stimulation Device
Direct measurement of \(\alpha_{\rm QED}(m_{\rm Z}^{2})\) at the FCC-ee
and 7 collaborators
When the measurements from the FCC-ee become available, an improved determination of the standard-model “input” parameters will be needed to fully exploit the new precision data towards either constraining or fitting the parameters of beyond-the-standard-model theories. Among these input parameters is the electromagnetic coupling constant estimated at the Z mass scale, $\alpha_{\rm QED}(m^2_{\rm Z})$. The measurement of the muon forward-backward asymmetry at the FCC-ee, just below and just above the Z pole, can be used to make a direct determination of $\alpha_{\rm QED}(m^2_{\rm Z})$ with an accuracy deemed adequate for an optimal use of the FCC-ee precision data.
MATH stuff
Here we provide a definition for the ’complex’ derivative of a real-valued function f : ℂn → ℝ with respect to its complex variables. The notation f : ℂn → ℝ means “f is a mapping (or function) from the set of column vectors of size n with complex components (denoted ℂn) into the set of real numbers (denoted ℝ).”
The complex derivative of x = a + jb ∈ ℂ, a, b ∈ ℝ, is defined as \begin{equation} \frac{dx}{dx} = \frac{dx}{da} + j\frac{dx}{db}. \end{equation}
Given x = a + jb ∈ ℂ, a, b ∈ ℝ, What is D|x|?
Solution:
We have \begin{equation}
|x| = \sqrt{x^*x} = \sqrt{(a-jb)(a+jb)} = \sqrt{a^2 + b^2}. \nonumber
\end{equation} Applying the definition of the complex derivative yields \begin{eqnarray}
\frac{d|x|}{dx} &=& \frac{d|x|}{da} + j\frac{d|x|}{db} \nonumber\\
&=& \frac{2a}{2\sqrt{a^2 + b^2}} + j\frac{2b}{2\sqrt{a^2 + b^2}} \nonumber\\
&=& \frac{a}{\sqrt{a^2 + b^2}} + j\frac{b}{\sqrt{a^2 + b^2}} \nonumber\\
&=& \frac{x}{|x|}. \nonumber
\end{eqnarray}
Given x = a + jb ∈ ℂ, a, b ∈ ℝ, What is D|x|2?
Solution:
We have \begin{equation}
|x|^2 = x^*x = (a-jb)(a+jb) = a^2 + b^2. \nonumber
\end{equation} Applying the definition of the complex derivative yields \begin{eqnarray}
\frac{d|x|^2}{dx} &=& \frac{d|x|^2}{da} + j\frac{d|x|^2}{db} \nonumber\\
&=& 2a + j2b \nonumber\\
&=& 2x. \nonumber
\end{eqnarray} Suppose f : ℂn → ℝ is a real-valued function and $x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$. The derivative Df(x) is a 1 × n matrix (a row vector), defined by \begin{equation}
\label{eqn:derivative}
Df(x) = \left[ \frac{\partial f}{\partial x_1}(x), \dots, \frac{\partial f}{\partial x_n}(x) \right].
\end{equation}
Given x = [x1, …, xn]T ∈ ℂn with xi = ai + jbi ∈ ℂ, ai, bi ∈ ℝ, What is D∥x∥ℓ22?
Solution:
We have \begin{eqnarray}
\|x\|_{\ell_2}^2 &=& \sum_{i=1}^n |x_i|^2 = \sum_{i=1}^n x_i^*x_i \nonumber\\
&=& \sum_{i=1}^n (a_i +jb_i)^*(a_i +jb_i) \nonumber\\
&=& \sum_{i=1}^n (a_i -jb_i)(a_i +jb_i) \nonumber\\
&=& \sum_{i=1}^n (a_i^2 +b_i^2). \nonumber
\end{eqnarray} We first look at the first element of Equation [eqn:derivative] with f(x)=∥x∥ℓ22. Applying the definition of the complex derivative gives \begin{eqnarray}
\frac{\partial \|x\|_{\ell_2}^2}{\partial x_1} &=& \frac{\partial \|x\|_{\ell_2}^2}{\partial a_1} +
j\frac{\partial \|x\|_{\ell_2}^2}{\partial b_1} \nonumber\\
&=& \frac{\partial }{\partial a_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) +
j\frac{\partial }{\partial b_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) \nonumber\\
&=& 2a_1 + j2b_1 \nonumber\\
&=& 2x_1. \nonumber
\end{eqnarray} Therefore, it follows that \begin{eqnarray}
Df(x) &=& \left[ \frac{\partial \|x\|_{\ell_2}^2}{\partial x_1}, \dots, \frac{\partial \|x\|_{\ell_2}^2}{\partial x_n} \right] \nonumber\\
&=& \left[2x_1, \ldots, 2x_n \right] \nonumber\\
&=& 2x^T. \nonumber
\end{eqnarray}
Suppose A ∈ ℂm × n, and x = [x1, …, xn]T ∈ ℂn with xi = ai + jbi ∈ ℂ, ai, bi ∈ ℝ. What is D(Ax)?
Solution:
Since f(x)=Ax : ℂn → ℂm, we have \begin{equation}
D(Ax) = \left[ \frac{\partial (Ax)}{\partial x_1}, \dots, \frac{\partial (Ax)}{\partial x_n} \right]. \nonumber
\end{equation} Since Ax ∈ ℂm, we express it as \begin{equation}
Ax = \left[ \begin{array}{c} (Ax)_1 \\ \vdots \\ (Ax)_m \end{array} \right]
= \left[ \begin{array}{c} \sum_{i=1}^n A_{1i}x_i \\ \vdots \\ \sum_{i=1}^n A_{mi}x_i \end{array} \right], \nonumber
\end{equation} and it follows that \begin{equation}
\frac{\partial (Ax)}{\partial x_1} = \left[ \begin{array}{c}
\frac{\partial (Ax)_1}{\partial x_1} \\ \vdots \\ \frac{\partial (Ax)_m}{\partial x_1}
\end{array} \right]
= \left[ \begin{array}{c} A_{11} \\ \vdots \\ A_{m1} \end{array} \right]. \nonumber
\end{equation} Using the expression above, we write the derivative of Ax as \begin{equation}
D(Ax) = \left[ \begin{array}{ccc}
\frac{\partial (Ax)_1}{\partial x_1} & \cdots & \frac{\partial (Ax)_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial (Ax)_m}{\partial x_1} & \cdots & \frac{\partial (Ax)_m}{\partial x_n}
\end{array} \right]
= \left[ \begin{array}{ccc}
A_{11} & \cdots & A_{1n} \\
\vdots & \ddots & \vdots \\
A_{m1} & \cdots & A_{mn}
\end{array} \right]
= A. \nonumber
\end{equation}
Blog Post 9
1 $\underline{\text{An Introduction to Graph Theory}}$ The concepts of graph theory go all the way back to the eighteenth century when, in 1736, Euler published what is believed to be the first paper on the very subject. It contained a famous problem known as the Königsberg Bridges Problem. This is a puzzle which considers the question of whether or not a person, starting from their home, could pass over each of the seven bridges that crossed the Pregal River exactly once before returning home. Euler was able to reconstruct the problem in such a way that allowed for him to lay the foundation of graph theory. To solve the puzzle, Euler replaced the land masses with vertices and let edges represent the bridges. The mathematical structure that became of his work is now called a graph.
Adjacency Matrix
Matrices in Graph Theory
Assignment 2
Proof by contradiction. Assume that the line segment between points \(P,Q\) has maximum pairwise distance, and that \(Q\) does not lie on a vertex. Let the point on the boundary of our hull found by extending the line \(PQ\) be denoted \(Q'\). This boundary segment is defined between two vertices on our convex hull which we refer to as \(A\) and \(B\). See \(\mathbf{Fig. 1}\) for clarification.
Clearly \(|PQ'|>|PQ|\). In the following section we show that there is always a line-segment longer than \(|PQ'|\). By the transitive property of inequality this segment must also be longer than \(PQ\), a contradiction.
Welcome to Authorea!
Hey, welcome. Double click anywhere on the text to start writing. In addition to simple text you can also add text formatted in boldface, italic, and yes, math too: E = mc2! Add images by drag’n’drop or click on the “Insert Figure” button.
Blog Post 8
1 $\underline{\text{Parity}}$
Parity, in terms of mathematics, describes the classification of an integer as either even or odd. An even number is defined as an integer that is divisible by 2 while an odd number is one that is not. A more formal definition states that an even number is an integer n of the form n = 2k where k is an integer. On the other hand, an odd number is an integer of the form n = 2k + 1. In set notation we see:
\[ \text{Even} \hspace{0.5mm} = \hspace{0.5mm} {2k : k \in \mathbb{Z}} \]
\[ \text{Odd} \hspace{0.5mm} = \hspace{0.5mm} {2k+1 : k \in \mathbb{Z}} \]
In number theory, the idea of parity allows us to solve some mathematical problems simply by making note of odd and even numbers. In the same way, the impossibility of some mathematical constructions can be proven. For example, consider the following question:
The impact of boreal wildfires on carbon and nitrogen dynamics: the interplay between biotic and abiotic processes
Wildfires are a natural phenomenon but human activities are altering both the driving factors (climate) and the vulnerability (land-use factors) of ecosystems, increasing both frequency and severity of fire impacts. This is an issue of concern given that wildfires play a major role in the global carbon cycle by affecting carbon and nitrogen storage in ecosystems. Yet, our knowledge of early post-fire carbon (C) and nitrogen (N) (hereafter abbreviated as CN) dynamics has been severely limited by the lack of cross-scale (from soil to plant to ecosystem) and cross-landscape (wetlands to uplands, managed and unmanaged land) studies. Understanding the mechanisms causing variability in CN dynamics (e.g., CN accumulation) , in heterogeneous landscapes, is critical for predicting changes in C and N storage with more frequent disturbance. Given this immediate research need, I propose an ambitious research program to investigate the impact of wildfires on the C and N cycle in the boreal landscape, capitalizing on a recent stand-replacing wildfire in Sweden. With an array of paired pre- and post-fire data, which is rare in wildfire ecosystem research, I aim to address whether pre-disturbance and initial post-disturbance conditions can be used to formulate predictions of post-disturbance ecosystem development. I will employ a novel multidiciplinary framework, which integrates ecological process, like plant community development, into the biogeochemical processes. This much needed integration makes it possible to improve and add new mechanisms to current ecosystem models and to answer under what conditions is the system is most vulnerable to change under frequent and severe wildfires. Three question-based work packages are described below as the basis of this wildfire research program:
CN losses. CN losses. Where in the landscape do the largest C and N losses occur, and what factors control losses? How large are CN combustion losses relative to C transformed into charcoal and hydrologically-exported CN following fire?
CN pool development. What is the relative importance of abiotic (e.g. soil moisture, temperature) and biotic (e.g. plant traits) factors in generating variation in post-fire recovery rate of C and N pools at different spatial scales?
Vegetation development. What controls species and trait assembly post-fire? What is the role of niche-based processes (abiotic effects: environmental filtering, and biotic effects: legacy effects, regeneration traits) in contrast to neutral processes (stochasticity, priority effects)?
HST proposal 2016
and 6 collaborators
The ‘Scientific Justification’ section of the proposal (see Section 9.1) should include a description of the scientific investigations that will be enabled by the final data products, and their importance
6 page limit, total proposal + figures can be 11.
One of the most powerful observational tools for constraining the physics governing galaxy formation and evolution is morphology. The structural features of a galaxy are known to have close relationships with its physical properties; eg. the link between star formation rate and Hubble type \citep{Masters2010,Bundy2010,Schawinski2014} or spiral arms \citep{Willett2015}, bars and AGN \citep{Oh2012,Hao09,Galloway2015}, bars and atomic gas content \citep{Masters2012}, [lots more possibilities of examples - help with more non-galaxy zoo examples?] It is known that the demographics of most morphological features are not, in general, constant as a function of redshift. This is not surprising, given that key elements involved in the formation of galaxies are also shown to change as the Universe evolves, eg. star formation is known to peak at z ∼ 1 and drop steadily thereafter.
[few paragraphs of more descriptive examples of how galaxy physics is related to morphology + reasons for studying 0 < z < 2)]
Obtaining morphological data for such large numbers of galaxies is a unique challenge, in that to date there is no system that can produce both accurate and complete morphologies using automated methods. This problem is especially present with increasing redshift, for two reasons. First, images of distant galaxies are less resolved, making it difficult to distinguish finer features in the image. Second, galaxy shapes become increasingly irregular in the early Universe, due to increased merger rate and the clumpy nature of star formation. As large telescopes become more capable of imaging these distant galaxies, we continue to discover for the first time new large-scale structures which do not exist at low z; this creates a difficulty in defining an automated categorization for these unique types. Until automated methods overcome these challenges, visual classification by humans remains the most accurate method of measuring galaxy morphology, especially for galaxies beyond the local Universe.
Visual classification is of course not without its own challenges, which are time and efficiency. While humans produce more accurate and complete classifications than a computer, the time it takes to do so is overwhelming for the wealth of data becoming available by large surveys. The Galaxy Zoo project has developed a highly innovative method for bypassing the time drawback while maintaining the accuracy of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org
asks people to classify the images by eye. Within its first year, each of the ∼1 million SDSS galaxies had already been classified an average of 40 times through the efforts of hundreds of thousands of members of the general public providing ∼40 million classifications \citep{Lintott2008,Fortson2012}.
In 2010, Galaxy Zoo moved beyond the local Universe by including ∼100, 000 HST galaxies in a project known as Galaxy Zoo: Hubble. All galaxies were classified at least 40 times by late 2012. This project enabled the first direct, morphologically accurate studies to be done on the evolution of galaxies, several of which have already been completed with the preliminary data, including bar fraction with redshift \citep{Cheung2014,Melvin2014} and passive disk fraction with redshift \citep{Galloway2016}. These only represent a small fraction of the numerous possibilities for scientific investigation capable with these data; disk/spheroidal distinction, bars, spiral arms, clumpiness, and bulge dominance are a portion of the morphological information provided by this catalog (for the full list see Figure [fig:decision tree]).
Our aim with this proposal is to develop the next phase of Galaxy Zoo:Hubble, which we will hereafter refer to as Galaxy Zoo:Hubble 2 (GZH2). The motivation for extending this project is twofold: First, although the visual classification methods have been immensely successful thus far in obtaining robust morphologies for large ( 100, 000) samples of galaxies, automation methods have improved since the first release in the form of powerful machine-learning algorithms. These alone are still not independently capable of accurate classification for galaxies at all redshifts, however combining these methods with the current system of human classifications has been shown to reduce the classification time of galaxies by 80% (can we cite something/ provide a figure Melanie?), thereby significantly improving both the efficiency and accuracy of GZH classifications. The details for this process are explained in full in the Analysis Plan. Second, in addition to the original GZH galaxies, an additional XX,XXX HST galaxies will be added to the project to be classified by this new method.
By combining machine-learning with human classifications, GZH2 will provide the most morphologically accurate data for the widest redshift range (to z ∼ 1.2) currently available. These data will enable countless new science projects involving galaxy evolution than has ever been capable to this level of accuracy. With the funding from this proposal, our team will focus on two science cases: clumpy galaxies (need better zinger description) and the mass-metallicity relation.
Homework 3
and 3 collaborators
To construct the L2 MNE operator, we used the formula found in the lecture slides: $$P_{MNE}=RL^T(LRL^T + \lambda C_n)^{-1}$$
The following Matlab script was used in the simple case without depth weighting:
load ex3_data.mat
%% MNE
L2=L*L';
snr = sqrt(n_trial);
lambda = trace(L2)/(trace(Cn)*snr^2);
ev = eigs(L2+lambda*Cn, rank(data1));
tol = ev(end)
p_mne=L'*pinv(L2+lambda*Cn, tol);
src1 = p_mne*data1;
src2 = p_mne*data2;
Here an identity matrix was used for the source covariance matrix.
Next, to include depth weighting, we modified the source covariance matrix:
%% Depth-Weighted MNE
W_i = diag(sqrt(sum(L,1).^2./306));
lambda = trace(L*W_i*L')/(trace(Cn)*snr^2);
ev = eigs(L*W_i*L'+lambda*Cn, rank(data1));
tol = ev(end)
pw_mne=W_i*L'*pinv(L*W_i*L'+lambda*Cn,tol);
srcw1 = pw_mne*data1;
srcw2 = pw_mne*data2;
Finally, we constructed a beamformer operator for the data:
%% Beamformer
N = 306;
M = 5124;
ev = eigs(Cd, rank(data1));
tol = ev(end)
p_bf = (pinv(Cd,tol)*L)';
denominator = diag(p_bf*L);
p_bf = p_bf./repmat(denominator,1, N);
srcb1 = p_bf*data1;
srcb2 = p_bf*data2;
using the following formula from the lecture slides $$P_{BF,\theta} = \dfrac{(C_\mathrm{d}^{-1} L_\theta)^T}{L_\theta^T C_\mathrm{d}^{-1} L_\theta},$$ where Lθ is the gain vector for source point θ.
We wanted to visualize the source estimates as a function of time. We achieved this with the following script:
%% Plots
close all
vis_surface_data(srcb1(:,115), 0.1, max(srcb1(:)), anat_decim)
%%
close all
figure(3)
hold on
vv = var(data1,[], 2);
plot(timeaxis,data1(vv > 0.3*max(vv),:))
plot([1,1]*timeaxis(115),ylim(gca()), 'k--')
plot([1,1]*timeaxis(140),ylim(gca()), 'k--')
plot([1,1]*timeaxis(190),ylim(gca()), 'k--')
xlabel('time [s]')
axis tight
%%
close all
figure(4)
hold on
vv = var(data2,[], 2);
plot(timeaxis,data2(vv > 0.3*max(vv),:))
plot([1,1]*timeaxis(125),ylim(gca()), 'k--')
plot([1,1]*timeaxis(190),ylim(gca()), 'k--')
xlabel('time [s]')
axis tight
Key Cryptography
RSA encryption can be used for many things such as keeping important messages secured. It is very difficult to break or decode messages that have been encrypted by RSA encryption if not given a public key. There are a few steps that one must go through in order to encrypt and decrypt a message.
We can look at few variables that are needed through the RSA encryption process: