GENESIS Tutorial 8.1 (2022)

8.1 MD simulation of Protein G in the GB/SA model

Implicit solvent models are useful to reduce computational cost in the simulation of biomolecules.¹ The GB/SA (Generalized Born/Solvent accessible surface area) model is one of the popular implicit solvent models, in which the electrostatic contribution to the solvation free energy is computed with the GB theory²^,³, and the non-polar contribution is calculated from the solvent accessible surface area ⁴^,⁵. In the GB theory, solvent molecules surrounding the solute are approximated as a continuum that has the dielectric constant of ~80. To date, various GB models have been developed such as GBSW,⁶ GBMV,⁷ HCT,⁸ and OBC.⁹ In GENESIS, the OBC model and LCPO method ¹⁰ are used for the calculation of the GB term and SA term, respectively. The GB/SA model is introduced only in ATDYN currently, and both CHARMM and AMBER force fields are available. In this tutorial, we learn how to use the GB/SA model with the AMBER ff14SB force field parameters in GENESIS.

0. Preparation

All the files required for this tutorial are hosted in the GENESIS tutorials repository on GitHub.

If you haven’t downloaded the files yet, open your terminal and run the following command:

$ cd ~/GENESIS_Tutorials-2022

# if not yet
$ git clone https://github.com/genesis-release-r-ccs/genesis_tutorial_materials

If you already have the tutorial materials, let’s go to our working directory.

$ cd genesis_tutorial_materials/tutorial-8.1

# Let's take a note
$ echo "tutorial-8.1: MD simulation of Protein G in GBSA" >> README

# Check the contents of Tutorial 8.1
$ ln -s ../../Programs/genesis-2.0.0/bin ./bin
$ ls 
1_setup  2_minimize  3_equilibrate  4_production  5_analysis  bin

1. Setup

In this tutorial, we simulate Protein G in the GB/SA model. In general, GB/SA simulation is performed for a system consisting of only solute molecules (no explicit water). In the following commands, we first make a symbolic link to the PDB file of Protein G (PDB: 2QMT) in the setup directory, and then build a system by using tLEAP in the AMBER tools (see also Tutorial 5.2). Since we use the OBC2 model, we specify set default PBradii mbondi2 in tleap.inp (see the AMBER manual). Here, we use the AMBER ff14SB force field parameters. We obtain proa.crd, proa.prmtop, and proa.pdb as the input files for GENESIS.

# Change directory for the system setup
$ cd 1_setup
$ ls
tleap.inp  vmd.tcl

# Make AMBER CRD and PRMTOP files using tLEAP
$ ln -s ../../../Data/PDB/2QMT.pdb ./
$ vmd -dispdev text -e vmd.tcl > vmd.log
$ tleap -f tleap.inp > tleap.log

$ ls
2QMT.pdb  proa.crd  proa.prmtop  tleap.log  vmd.log
leap.log  proa.pdb  tleap.inp    tmp.pdb    vmd.tcl

# View the initial structure
$ vmd -parm7 proa.prmtop -rst7 proa.crd

2. Minimization

Basically, the simulation scheme with the implicit solvent model is almost the same as those with the explicit solvent model. First, we carry out energy minimization. Let’s take a look at the control file. The following shows the most important part. In the GB/SA simulation, we usually use a longer cutoff distance than that in the explicit solvent simulation, because the electrostatic interaction is truncated at the cut-off distance. In addition, we use non-periodic boundary conditions. In this example, we use the default parameters for GBSA, where the dielectric constants of solute and solvent are set to 1.0 and 78.5, respectively, and the salt concentration is set to 0.2 mol/L, and the surface tension coefficient is 0.005 kcal/mol/Å² (see the GENESIS user manual). Note that in GENESIS only OBC model is available currently, and the OBC2 parameters are the default setting.

# Change directory for energy minimization
$ cd ../2_minimize
$ less INP

[ENERGY]
forcefield       = AMBER   # [AMBER]
electrostatic    = CUTOFF  # [CUTOFF]
switchdist       = 23.0    # switch distance
cutoffdist       = 25.0    # cutoff distance
pairlistdist     = 28.0    # pair-list distance
implicit_solvent = GBSA    # [GBSA]

[BOUNDARY]
type             = NOBC    # [NOBC]

Then, we execute ATDYN. The following is an example command using 4 CPU cores.

# Run ATDYN 
$ export OMP_NUM_THREADS=1
$ mpirun -np 4 ../bin/atdyn INP > log

3. Equilibration

In the equilibration run, we carry out a 10-ps MD simulation at 298.15K with positional restraints on the Cα atoms. The equations of motion are integrated with the velocity Verlet algorithm with the time step of 2 fs, where the SHAKE algorithm is used for bond constraint. The [ENERGY] section is same with that in the previous energy minimization. We use the Langevin thermostat to consider the random effects of the solvent on the solute atoms.

# Change directory for equilibration
$ cd ../3_equilibrate
$ less INP

[ENSEMBLE]
ensemble         = NVT       # [NVE,NVT]
tpcontrol        = LANGEVIN  # [NO,BERENDSEN,BUSSI,LANGEVIN]
temperature      = 298.15    # initial and target temperature (K)

Let’s execute ATDYN. In the log file, the solvation free energy (GB + SA terms) is shown in the 16th column.

# Run ATDYN 
$ mpirun -np 4 ../bin/atdyn INP > log

4. Production run

In the production run, we carry out a 100-ps MD simulation without any restraints. The control file is almost the same as that in the previous equilibration run. The simulation time is extended, and the positional restraint is now turned off. We use INP1 for the GB/SA simulation. For comparison, we also run the gas-phase simulation using INP2, where the GB/SA term is turned off.

# Change directory for production run
$ cd ../4_production

# Run ATDYN (GBSA on)
$ mpirun -np 4 ../bin/atdyn INP1 > log1

# Run ATDYN (GBSA off) 
$ mpirun -np 4 ../bin/atdyn INP2 > log2

Let’s compare the computational time between the two simulations. The computational cost of the GB/SA simulation is about six times larger than that of the gas-phase simulation. However, we can easily see that the secondary structure in Protein G is quickly broken in the gas-phase, suggesting that the solvent effect is important for the stabilization of the protein structure.

5. Analysis

Hydrogen bond formation

Now, we analyze the hydrogen bond formation in the GBSA and gas-phase trajectories. We use “hb_analysis” in the GENESIS analysis tool set, and the control files are already contained in the 5_analysis/H-bonds directory. In the analysis, hydrogen bond is defined if the D…A (donor…acceptor) distance is shorter than 3.4 Å, the D-H…A angle is smaller than 120°, and H-D…A angle is greater than 30°.

# Change directory for the hydrogen bond analysis
$ cd ../5_analysis
$ cd H-bonds
$ ls
INP1  INP2

[OPTION] 
check_only    = NO          # only checking input files (YES/NO)
allow_backup  = NO          # backup existing output files (YES/NO)
analysis_atom = 1           # atom group for searching HB partners
target_atom   = 1           # search the HB partners from this group
hb_distance   = 3.4         # the upper limit of (D .. A) distance (default: 3.4 A)
DHA_angle     = 120.0       # the lower limit of (D-H .. A) angle (default: 120 deg.)
HDA_angle     = 30.0        # the upper limit of (H-D .. A) angle (default: 30 deg.)
boundary_type = NOBC        # (PBC / NOBC)
output_type   = count_snap  # (count_atom / count_snap)
                            # count_atom: number of H-bonds is output for each pair
                            # count_snap: number of H-bonds is output every snapshot

Let’s run hb_analysis:

# Run hb_analysis
$ ../../bin/hb_analysis INP1 > log1
$ ../../bin/hb_analysis INP2 > log2

Then, plot the output_gbsa.out and output_gas.out. You will see that the number of hydrogen bonds in the gas-phase MD trajectory is more than that in the GB/SA MD trajectory. This is because in the gas phase electrostatic interactions between polar groups have been enhanced by the absence of the shielding effect of the solvent, resulting in a broken secondary structure of the protein.

Secondary structure formation

We also analyze the secondary-structure formation tendency of each amino acid in Protein G using the DSSP algorithm. Please, install the DSSP program in your computer, if you don’t have. DSSP has been commonly used to analyze the secondary structure of proteins. However, the program can read only one PDB file. The GENESIS program package provides an interface tool (dssp_interface) to execute DSSP for the snapshot in DCD trajectory files.

Change directory to 5_analysis/DSSP. We assume that you have installed the program in /home/user/DSSP. Before analyzing the DCD files, let’s execute dssp for the initial PDB structure. In the output data, you can see the characters “H”, “E”, “S”, “T” in the 5th column, which indicates α-helix, β-sheet, bend, and turn, respectively (for details, see the DSSP manual).

# Change directory for the analysis
$ cd ../5_analysis/DSSP
$ ls
INP1  INP2  propensity

# Run DSSP for the initial structure
/home/user/DSSP/dssp ../../1_setup/proa.pdb

By using dssp_interface, we can run the DSSP program for each snapshot in the DCD file. Sample control files are included in the directory (INP1 for GBSA, and INP2 for gas-phase). In the control file, we specify a path to the DSSP program. dssp_interface generates temporary_pdb for each snapshot, and dssp analyzes temporary_pdb. This is repeated for all snapshots.

[SELECTION]
group1         = all                     # selection group 1
 
[OPTION]
analysis_atom  = 1                       # protein atom group
dssp_exec      = /home/user/DSSP/dssp    # dssp binary path
temporary_pdb  = temporary_gbsa.pdb      # temporary pdb for DSSP

After the calculation, we obtain dssp_gbsa.out and dssp_gas.out from INP1 and INP2, respectively. In the output files, “raw” results obtained by DSSP for all snapshots are contained.

# Run dssp_interface for INP1 and INP2
$ ../../bin/dssp_interface INP1 > log1
$ ../../bin/dssp_interface INP2 > log2

$ ls
INP1  dssp_gas.out   log1  propensity         temporary_gbsa.pdb
INP2  dssp_gbsa.out  log2  temporary_gas.pdb

Let’s analyze the output files to calculate a secondary structure propensity of each amino acid in the protein. For this purpose, we need an additional analysis tool. The example awk program is included in the directory propensity. Here, calc.awk computes a rate of the secondary structure formation of each amino acid (see Appendix of this page). The α-helix and β-sheet propensities are shown in the 2nd and 3rd column of the output file, respectively. If the propensity is one, the secondary structure was formed in all snapshots.

# Change directory to analyze secondary structure propensity
$ cd propensity
$ ls
calc.awk

# Calculate the secondary structure propensity
$ awk -f calc.awk ../dssp_gbsa.out > gbsa.out
$ awk -f calc.awk ../dssp_gas.out  > gas.out

Let’s plot the results. We can see that the α-helix and β-sheet of Protein G are stable in the GB/SA simulation, while they are unstable in the gas-phase simulation.

Appendix

When you use the program calc.awk for other cases, you must take care about the following red parts. nres is the total number of amino acid residues. The characters “H” and “E” in the 5th column are counted to compute the α-helix and β-sheet propensity, respectively. Note that those characters should be in the 4th column if the PDB file does not include the chain ID.

BEGIN {
  nres = 56;
  calc =  0;
  istr =  0;
  for (i=1; i<=nres; i++) freq_H[i] = 0;
  for (i=1; i<=nres; i++) freq_E[i] = 0
}

{ if ($1 == "SNAPSHOT") istr = istr + 1
  if (calc == 1) {
    if ($5 == "H") freq_H[$1] = freq_H[$1] + 1;
    if ($5 == "E") freq_E[$1] = freq_E[$1] + 1
  }
  if ($2 == "RESIDUE") calc = 1
  if ($1 == nres)      calc = 0
}

END {
  printf("RESNO    HELIX    SHEET\n")
  for (i=1; i<=nres; i++) {
    prop_H[i] = freq_H[i]/istr;
    prop_E[i] = freq_E[i]/istr;
    printf("%5d %8.5f %8.5f\n",i, prop_H[i], prop_E[i])
  }
}

Written by Takaharu Mori@RIKEN Theoretical molecular science laboratory March 22, 2022

References

T. Mori et al., BBA-Biomembranes, 1858, 1635-1651 (2016). ↩
W. C. Still et al., J. Am. Chem. Soc., 112, 6127-6129 ↩
D. Bashford and D. A. Case, Annu. Rev. Phys. Chem., 51,129-152 (2000). ↩
D. Eisenberg and A. D. McLachlan, Nature, 319, 199-203 (1986). ↩
T. Ooi et al., Proc. Natl. Acad. Sci. U.S.A., 84, 3086-3090 (1987). ↩
W. Im et al., J. Comput. Chem., 24, 1691-1702, (2003). ↩
M. S. Lee et al., J. Chem. Phys., 116, 10606 (2002). ↩
G. D. Hawkins et al., J. Phys. Chem., 100, 19824-19839 (1996). ↩
A. Onufriev et al., Proteins, 55, 383-394 (2004). ↩
J. Weiser et al., J. Comput. Chem. 20, 217-230 (1999). ↩