The purpose of this homework is to explore the Monte Carlo Algorithm and apply it to the simplified protein folding model in 2D. MONTE CARLO METHOD Monte Carlo Method uses the randomly generated possible solutions to a certain problem in a solution space and test its degree of goodness based on certain physical requirements. The ways of generating the possible solutions are usually two. First, we can generate the possible solutions totally at random. For example, we use random number generator to do Monte Carlo Integration. Second, we can generate the possible solutions from the previous step by randomly changing some parameters of the previous one. We will use the later one to generate our 2D protein structures in this homework. The general flow of Monte Carlo Method is shown as follows. Note that we use the term “conformation space” instead of “solution space” since we are talking about protein structures here. 1. Start from a initial state in the conformation space. 2. Randomly change the previous state, subjecting to requirement 1. 3. Determine the degree of goodness by criterion 2. 4. Accept/ reject this state by physical rule 3. 5. If it is accepted, pass this state and repeat 2 through 5 for certain number of steps. 6. If it is rejected, do not pass the state and repeat 2 through 4 until the new state is accepted. The requirement 1, criterion 2 and rule 3 are problem-specific and we will mention these in our protein folding problem. 2D PROTEIN FOLDING Proteins are composed of 20 different amino acids (AAs) in a polypeptide chain and due to the mutual interactions between those AAs, proteins will favor some folded states to lower the Gibbs free energy. The interactions are mostly negative because of hydrophobic effects or ion-ion interactions. In order for proteins to perform certain biological functions, their unique structures are essential. We can use a simple bead-and-chain model for a 2D protein chain, assuming that all the AAs are of the same size and the peptide bond between two AAs is rigid, being only one unit and unstretchable. Each AA occupies one grid point of the 2D space and cannot be in the same point of any other AAs. When protein folds, the non-covalent interactions apply to the two non-bonding AAs separate by one unit. An we can calculate the relative Gibbs free energy ΔG by summing all the interactions of non-bonding neighbors. \Delta G = E_0 = E_{t(i)t(j)} where (i, j) are the indices of two neighboring AAs of types (t(i),t(j)) and E is an 20 × 20 interaction matrix. With this model in mind, we can determine the requirements mentioned in the previous section. 1. Requirement 1: 1. The modified AA cannot occupy other’s positions. 2. The modified AA must be one unit away from its neighbor(s). 3. The best way to modify an AA’s position is to move (1, 0), (1, 1), (0, 1), ( − 1, 1), ( − 1, 0), ( − 1, −1), (0, −1) and (1, −1), eight possible changes. 2. Criterion 2: Evaluate the interaction energy, E₀ and use this number to determine the goodness of the state. The lower, the better. 3. Rule 3: 1. If the new state has lower E₀, the protein will adopt this state in order to reach the minimum of the folding landscape. 2. If the new state has higher E₀, the protein does not favor such state. However, there is still some probability to jump from lower energy state to higher energy ones, P = e−(Enew − E₀)/kT. Once the model is set and the steps are clear, we can start to do the simulation. PROGRAM CODES First, we need a general random number generator, _myrand(seed)_. We will test its validity and then apply it to alter the position of a randomly selected AA. Given different _seed_, the function will give different random number sequences. Our seeds for generating interaction matrix E and the AA sequence in the protein are two distinct yet fixed value. So we will guarantee that we use exactly the same protein and interactions throughout the calculation. Other than those, the seed will be set by _time(NULL)_ and independent of our bias. Second, there are several subroutines to do the Monte Carlo calculations and to make sure the protein is subject to some requirements. Note that the information of proteins is stored in a 45 × 3 matrix with the first column being AA types, second the x positions and third the y positions. 1. _neighbor()_: inputs a Protein Vector and outputs the pairs of indices of two non-bonding AAs. 2. _Energy()_: input pairs of neighbor indices, Protein Vector, interaction matrix E and outputs the energy E₀. 3. _n2ndistance()_: inputs a Protein Vector and outputs the end-to-end distance of the protein. 4. _pcheck()_: inputs Protein Vector and the index of certain AA and check if that AA occupies others’ positions. The function outputs _true_ if the protein is not allowed, _false_ otherwise. 5. _conformationchange()_: input Protein Vector and make a position change to one of its AA and outputs a modified new Protein Vector. RESULTS First we tested the random number generator _myrand()_. The random number generator gives a uniform distribution of numbers between 0 and 1. And the points (xn + 1, xn) cover the 1 × 1 square without noticeable patterns as shown in Fig. 1. We further test it by estimating π. {4} = }{N} where Nin is the number of points in the quarter circle and N is the total number of points.As the total number of points increases, the RHS will reach ${4}$ asympotically, as shown in Fig. 2.