Gromacs 2016.1 Benchmark [Archiv]

Paran

2016-11-13, 16:29:42

Ich versuche mir gerade eine Preis-/Leistungsübersicht in Bezug auf Gromacs und unterschiedliche CPU / GPU Systeme zu erstellen.

Gromacs ist ein Softwarepaket für Simulationen und Auswertungen molekulardynamischer Prozesse und war früher auch Teil von Folding@home.

Am ehesten versuche ich hier Personen anzusprechen, welche auf ihrem System auch ein Linux laufen haben.
Ich habe zwei Simulationssysteme, welche als Benchmark genutzt werden können.
Die aktuelle Version von Gromacs (2016.1 (ftp://ftp.gromacs.org/pub/gromacs/gromacs-2016.1.tar.gz)) gibt es hier (http://manual.gromacs.org/documentation/2016.1/download.html)

Als Installationsablauf kann man dies nutzen:

tar xfz gromacs-2016.1.tar.gz
cd gromacs-2016.1
mkdir build
cd build
cmake .. -DGMX_BUILD_OWN_FFTW=ON
make
make check
sudo make install
source /usr/local/gromacs/bin/GMXRC

Falls auch noch eine aktuelle GPU im System hast, gäbe es noch weitere Installationsparameter um diese auch zur Berechnung zur nutzen.
(-DGMX_GPU=on)

Wer noch keine eigene FFT-Version bereits im System nutzt, kann den oben angeführten Parameter nutzen und eine eigene FFTW-Version compilieren, hier wird aber mindestens gcc 4.9 benötigt, ansonsten fehlt der AVX2 Befehlssatz und das compilieren sollte fehlschlagen.

-DGMX_SIMD=xxx
Der Parameter könnte auch interessant sein, da hier doch aktuelle Hardware vertreten sein sollte.

Verfügbar sind die SIMD:
None For use only on an architecture either lacking SIMD, or to which GROMACS has not yet been ported and none of the options below are applicable.
SSE2 This SIMD instruction set was introduced in Intel processors in 2001, and AMD in 2003. Essentially all x86 machines in existence have this, so it might be a good choice if you need to support dinosaur x86 computers too.
SSE4.1 Present in all Intel core processors since 2007, but notably not in AMD Magny-Cours. Still, almost all recent processors support this, so this can also be considered a good baseline if you are content with slow simulations and prefer portability between reasonably modern processors.
AVX_128_FMA AMD bulldozer processors (2011) have this.
AVX_256 Intel processors since Sandy Bridge (2011). While this code will work on recent AMD processors, it is significantly less efficient than the AVX_128_FMA choice above - do not be fooled to assume that 256 is better than 128 in this case.
AVX2_256 Present on Intel Haswell (and later) processors (2013), and it will also enable Intel 3-way fused multiply-add instructions.
AVX_512 Skylake-EP Xeon processors (2017)
AVX_512_KNL Knights Landing Xeon Phi processors
IBM_QPX BlueGene/Q A2 cores have this.
Sparc64_HPC_ACE Fujitsu machines like the K computer have this.
IBM_VMX Power6 and similar Altivec processors have this.
IBM_VSX Power7 and Power8 have this.
ARM_NEON 32-bit ARMv7 with NEON support.
ARM_NEON_ASIMD 64-bit ARMv8 and later.

Ich habe hier (https://www.dropbox.com/sh/sxrleceqfk3oaa5/AAAJA4W5z27Wud9mZEZaO5fna?dl=0) die beiden Testsysteme abgelegt.

Die beiden Systeme ruft man fast identisch auf indem man ein Terminal in dem jeweiligen Ordner öffnet:

System 1:

source /usr/local/gromacs/bin/GMXRC
gmx grompp -f md1.mdp -c out_ions_kubisch.gro -p 3fav_topol.top -o out_ions_kubisch.tpr
gmx mdrun -s out_ions_kubisch.tpr -c out_ions2_kubisch.gro -nb cpu -g md_cpu_kubisch.log

System 2:

source /usr/local/gromacs/bin/GMXRC
gmx grompp -f md1.mdp -c out_ions.gro -p 3fav_topol.top -o out_ions.tpr
gmx mdrun -s out_ions.tpr -c out_ions2.gro -nb cpu -g md_cpu.log

In der jeweils entstehende log-Datei gibt es am Ende eine Performance-Übersicht.

Für CUDA/OpenCL unterstütze Benchmarks ist der Aufruf etwas anders.

Auf meinem i6700k + GTX 1070 ergibt sich folgende Performance:

System 1:

Core t (s) Wall t (s) (%)
Time: 2048.138 256.017 800.0
(ns/day) (hour/ns)
Performance: 337.478 0.071

System 2:
Core t (s) Wall t (s) (%)
Time: 11913.140 1489.143 800.0
(ns/day) (hour/ns)
Performance: 58.020 0.414

Wem die Simulation zu lang dauert, kann über den Parameter -maxh 0.06 (Die Zahl ist ein Vielfaches von Stunden-Rechenzeit) die Simulation noch einer festen Zeit abbrechen.

Ich würde mich über ein paar Ergebnisse freuen.

Loeschzwerg

2016-11-22, 18:38:37

CPU: Xeon E5-2697 v4 18C/36T @ 2.30GHz
OS: OpenSUSE Tumbleweed

Build OS/arch: Linux 4.8.8-1-default x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic

System 1

Core t (s) Wall t (s) (%)
Time: 13090.886 363.636 3600.0
(ns/day) (hour/ns)
Performance: 0.732 32.774

Komplettes Logfile zu System 1:

Log file opened on Tue Nov 22 18:17:03 2016
Host: linux-7k8c pid: 12918 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2016.1 (-:

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof
Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund
Teemu Murtola Szilard Pall Sander Pronk Roland Schulz
Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman
Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: gmx mdrun, version 2016.1
Executable: /usr/local/bin/gmx
Data prefix: /usr/local
Working dir: /home/loeschzwerg/gromacs-2016.1/System1
Command line:
gmx mdrun -s out_ions_kubisch.tpr -c out_ions2_kubisch.gro -nb cpu -g md_cpu_kubisch.log -maxh 0.10

GROMACS version: 2016.1
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX2_256
FFT library: fftw-3.3.5-fma-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: Di 22. Nov 16:24:42 CET 2016
Built by: loeschzwerg@linux-7k8c [CMAKE]
Build OS/arch: Linux 4.8.8-1-default x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 6.2.1
C compiler flags: -march=core-avx2 -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 6.2.1
C++ compiler flags: -march=core-avx2 -std=c++0x -funroll-all-loops -fexcess-precision=fast

Running on 1 node with total 18 cores, 36 logical cores
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Family: 6 Model: 79 Stepping: 1
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256

Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 18] [ 1 19] [ 2 20] [ 3 21] [ 4 22] [ 5 23] [ 6 24] [ 7 25] [ 8 26] [ 9 27] [ 10 28] [ 11 29] [ 12 30] [ 13 31] [ 14 32] [ 15 33] [ 16 34] [ 17 35]

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Changing nstlist from 10 to 20, rlist from 1.2 to 1.224

Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 500000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = -864052975
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 1000
nstvout = 1000
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 200
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1.224
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1.2
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 1.2
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 72
fourier-ny = 72
fourier-nz = 72
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = V-rescale
nsttcouple = 10
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 10
tau-p = 0.8
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 4
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 103928
ref-t: 300
tau-t: 0.1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0

Initializing Domain Decomposition on 36 ranks
Dynamic load balancing: auto
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.405 nm, LJ-14, atoms 654 662
multi-body bonded interactions: 0.474 nm, CMAP Dih., atoms 489 515
Minimum cell size due to bonded interactions: 0.522 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
Estimated maximum distance required for P-LINCS: 0.222 nm
Guess for relative PME load: 0.18
Will use 27 particle-particle and 9 PME only ranks
This is a guess, check the performance at the end of the log file
Using 9 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 27 cells with a minimum initial size of 0.652 nm
The maximum allowed number of cells is: X 12 Y 12 Z 12
Domain decomposition grid 9 x 3 x 1, separate PME ranks 9
PME domain decomposition: 9 x 1 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.

Domain decomposition rank 0, coordinates 0 0 0

The initial number of communication pulses is: X 2 Y 1
The initial domain decomposition cell size is: X 0.89 nm Y 2.67 nm

The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.224 nm
(the following are initial values, they could change due to box deformation)
two-body bonded interactions (-rdd) 1.224 nm
multi-body bonded interactions (-rdd) 0.889 nm
atoms separated by up to 5 constraints (-rcon) 0.889 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 2 Y 2
The minimum size for domain decomposition cells is 0.682 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.77 Y 0.26
The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.224 nm
two-body bonded interactions (-rdd) 1.224 nm
multi-body bonded interactions (-rdd) 0.682 nm
atoms separated by up to 5 constraints (-rcon) 0.682 nm

Using 36 MPI threads
Using 1 OpenMP thread per tMPI thread

Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Cut-off's: NS: 1.224 Coulomb: 1.2 LJ: 1.2
System total charge: 0.000
Generated table with 1112 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1112 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1112 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: 1176

Using SIMD 4x8 non-bonded kernels

Using Lorentz-Berthelot Lennard-Jones combination rule

Removing pbc first time
Pinning threads with an auto-selected logical core stride of 1

Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 1048
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------

Linking all bonded interactions to atoms

Intra-simulation communication will occur every 10 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

There are: 51423 Atoms
Atom distribution over 27 domains: av 1904 stddev 60 min 1816 max 1975

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 0.00e+00
Initial temperature: 2.92511e-07 K

Started mdrun on rank 0 Tue Nov 22 18:17:05 2016
Step Time
0 0.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
4.57925e+02 1.59888e+03 2.15191e+03 9.80102e+01 -9.34528e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.34561e+03 2.95220e+04 1.07650e+05 -9.00084e+05 3.03772e+03
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-7.55157e+05 2.67676e+02 -7.54890e+05 6.19546e-01 -3.77469e+03
Constr. rmsd
1.07190e-06

DD step 19 load imb.: force 8.0% pme mesh/force 0.105

step 40 Turning on dynamic load balancing, because the performance loss due to load imbalance is 7.0 %.

DD step 999 vol min/aver 0.805 load imb.: force 1.3% pme mesh/force 0.117

Step Time
1000 2.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.25755e+03 3.30517e+03 2.52043e+03 2.73825e+02 -8.66634e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.44599e+03 2.93594e+04 9.44752e+04 -8.02333e+05 2.51472e+03
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-6.68047e+05 1.28968e+05 -5.39080e+05 2.98499e+02 -1.53907e+02
Constr. rmsd
1.07797e-06

Step 1520: Run time exceeded 0.099 hours, will terminate the run
Step Time
1540 3.08000

Writing checkpoint, step 1540 at Tue Nov 22 18:23:08 2016

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.26855e+03 3.52186e+03 2.52836e+03 3.12481e+02 -8.61454e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.33529e+03 2.92574e+04 9.62241e+04 -8.05147e+05 2.42558e+03
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-6.69135e+05 1.30425e+05 -5.38710e+05 3.01873e+02 7.12021e+01
Constr. rmsd
0.00000e+00

<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>

Statistics over 1541 steps using 16 frames

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.14966e+03 2.99855e+03 2.43064e+03 2.29412e+02 -8.64918e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.38774e+03 2.94729e+04 9.96400e+04 -8.21390e+05 2.42423e+03
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-6.82522e+05 1.17004e+05 -5.65518e+05 2.70809e+02 -2.39929e+02
Constr. rmsd
0.00000e+00

Box-X Box-Y Box-Z
8.05419e+00 8.05419e+00 8.05419e+00

Total Virial (kJ/mol)
4.26217e+04 -3.14673e+02 -7.15206e+02
-3.17536e+02 4.32893e+04 -9.71665e+02
-7.15664e+02 -9.71705e+02 4.21927e+04

Pressure (bar)
-2.35133e+02 1.71222e+01 5.43447e+01
1.73121e+01 -2.81374e+02 6.42371e+01
5.43802e+01 6.42355e+01 -2.03279e+02

M E G A - F L O P S A C C O U N T I N G

NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only

Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 5.912817 5.913 0.0
Pair Search distance check 2069.708722 18627.378 0.5
NxN Ewald Elec. + LJ [F] 31533.316784 2081198.908 50.8
NxN Ewald Elec. + LJ [V&F] 351.765424 37638.900 0.9
NxN Ewald Elec. [F] 27831.021424 1697692.307 41.5
NxN Ewald Elec. [V&F] 310.575280 26088.324 0.6
1,4 nonbonded interactions 8.512484 766.124 0.0
Calc Weights 237.728529 8558.227 0.2
Spread Q Bspline 5071.541952 10143.084 0.2
Gather F Bspline 5071.541952 30429.252 0.7
3D-FFT 21292.724352 170341.795 4.2
Solve PME 7.988544 511.267 0.0
Reset In Box 4.010994 12.033 0.0
CG-CoM 4.062417 12.187 0.0
Bonds 1.678149 99.011 0.0
Propers 6.843581 1567.180 0.0
Impropers 0.642597 133.660 0.0
Virial 8.158890 146.860 0.0
Stop-CM 0.874191 8.742 0.0
Calc-Ekin 15.941130 430.411 0.0
Lincs 1.826659 109.600 0.0
Lincs-Mat 10.224840 40.899 0.0
Constraint-V 89.293220 714.346 0.0
Constraint-Vir 8.792116 211.011 0.0
Settle 28.565969 9226.808 0.2
(null) 0.218822 0.000 0.0
-----------------------------------------------------------------------------
Total 4094714.225 100.0
-----------------------------------------------------------------------------

D O M A I N D E C O M P O S I T I O N S T A T I S T I C S

av. #atoms communicated per step for force: 2 x 119382.2
av. #atoms communicated per step for LINCS: 5 x 6571.7

Average load imbalance: 2.4 %
Part of the total run time spent waiting due to load imbalance: 2.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %
Average PME mesh/force load: 0.119
Part of the total run time spent waiting due to PP/PME imbalance: 21.6 %

NOTE: 21.6 % performance was lost because the PME ranks
had less work to do than the PP ranks.
You might want to decrease the number of PME ranks
or decrease the cut-off and the grid spacing.

R E A L C Y C L E A N D T I M E A C C O U N T I N G

On 27 MPI ranks doing PP, and
on 9 MPI ranks doing PME

Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 27 1 78 0.358 22.195 0.1
DD comm. load 27 1 77 0.015 0.950 0.0
DD comm. bounds 27 1 76 0.021 1.322 0.0
Send X to PME 27 1 1541 0.020 1.214 0.0
Neighbor search 27 1 78 4.473 277.616 0.9
Comm. coord. 27 1 1463 0.879 54.552 0.2
Force 27 1 1541 346.599 21511.453 71.5
Wait + Comm. F 27 1 1541 7.411 459.932 1.5
PME mesh * 9 1 1541 40.217 832.022 2.8
PME wait for PP * 323.415 6690.848 22.2
Wait + Recv. PME F 27 1 1541 0.051 3.163 0.0
NB X/F buffer ops. 27 1 4467 0.371 23.042 0.1
Write traj. 27 1 9 0.018 1.134 0.0
Update 27 1 1541 0.493 30.605 0.1
Constraints 27 1 1541 2.565 159.225 0.5
Comm. energies 27 1 155 0.161 10.014 0.0
Rest 0.200 12.422 0.0
-----------------------------------------------------------------------------
Total 363.636 30091.786 100.0
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 9 1 3082 1.779 36.804 0.1
PME spread/gather 9 1 3082 32.495 672.270 2.2
PME 3D-FFT 9 1 3082 2.184 45.181 0.2
PME 3D-FFT Comm. 9 1 3082 0.280 5.787 0.0
PME solve Elec 9 1 1541 3.475 71.882 0.2
-----------------------------------------------------------------------------

Core t (s) Wall t (s) (%)
Time: 13090.886 363.636 3600.0
(ns/day) (hour/ns)
Performance: 0.732 32.774
Finished mdrun on rank 0 Tue Nov 22 18:23:08 2016

Loeschzwerg

2016-11-22, 18:39:44

System 2

Core t (s) Wall t (s) (%)
Time: 12893.176 358.144 3600.0
(ns/day) (hour/ns)
Performance: 4.994 4.806

Komplettes Logfile zu System 2:

Log file opened on Tue Nov 22 18:03:59 2016
Host: linux-7k8c pid: 12665 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2016.1 (-:

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof
Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund
Teemu Murtola Szilard Pall Sander Pronk Roland Schulz
Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman
Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: gmx mdrun, version 2016.1
Executable: /usr/local/bin/gmx
Data prefix: /usr/local
Working dir: /home/loeschzwerg/gromacs-2016.1/System2
Command line:
gmx mdrun -s out_ions.tpr -c out_ions2.gro -nb cpu -g md_cpu.log -maxh 0.10

GROMACS version: 2016.1
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX2_256
FFT library: fftw-3.3.5-fma-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: Di 22. Nov 16:24:42 CET 2016
Built by: loeschzwerg@linux-7k8c [CMAKE]
Build OS/arch: Linux 4.8.8-1-default x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 6.2.1
C compiler flags: -march=core-avx2 -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 6.2.1
C++ compiler flags: -march=core-avx2 -std=c++0x -funroll-all-loops -fexcess-precision=fast

Running on 1 node with total 18 cores, 36 logical cores
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Family: 6 Model: 79 Stepping: 1
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256

Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 18] [ 1 19] [ 2 20] [ 3 21] [ 4 22] [ 5 23] [ 6 24] [ 7 25] [ 8 26] [ 9 27] [ 10 28] [ 11 29] [ 12 30] [ 13 31] [ 14 32] [ 15 33] [ 16 34] [ 17 35]

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Changing nstlist from 10 to 25, rlist from 1.2 to 1.226

Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 500000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 509669040
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 1000
nstvout = 1000
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 200
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 25
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1.226
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1.2
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 1.2
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 72
fourier-ny = 25
fourier-nz = 25
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = V-rescale
nsttcouple = 10
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 10
tau-p = 0.8
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 4
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 15506
ref-t: 300
tau-t: 0.1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0

Initializing Domain Decomposition on 36 ranks
Dynamic load balancing: auto
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.404 nm, LJ-14, atoms 341 349
multi-body bonded interactions: 0.473 nm, CMAP Dih., atoms 489 515
Minimum cell size due to bonded interactions: 0.521 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.222 nm
Estimated maximum distance required for P-LINCS: 0.222 nm
Guess for relative PME load: 0.16
Will use 30 particle-particle and 6 PME only ranks
This is a guess, check the performance at the end of the log file
Using 6 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 30 cells with a minimum initial size of 0.651 nm
The maximum allowed number of cells is: X 12 Y 4 Z 4
Domain decomposition grid 10 x 3 x 1, separate PME ranks 6
PME domain decomposition: 6 x 1 x 1
Interleaving PP and PME ranks
This rank does only particle-particle work.

Domain decomposition rank 0, coordinates 0 0 0

The initial number of communication pulses is: X 2 Y 2
The initial domain decomposition cell size is: X 0.80 nm Y 1.00 nm

The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.226 nm
(the following are initial values, they could change due to box deformation)
two-body bonded interactions (-rdd) 1.226 nm
multi-body bonded interactions (-rdd) 0.800 nm
atoms separated by up to 5 constraints (-rcon) 0.800 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 2 Y 2
The minimum size for domain decomposition cells is 0.637 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.80 Y 0.64
The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.226 nm
two-body bonded interactions (-rdd) 1.226 nm
multi-body bonded interactions (-rdd) 0.637 nm
atoms separated by up to 5 constraints (-rcon) 0.637 nm

Using 36 MPI threads
Using 1 OpenMP thread per tMPI thread

Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Cut-off's: NS: 1.226 Coulomb: 1.2 LJ: 1.2
System total charge: 0.000
Generated table with 1113 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1113 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1113 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: 1176

Using SIMD 4x8 non-bonded kernels

Using Lorentz-Berthelot Lennard-Jones combination rule

Removing pbc first time
Pinning threads with an auto-selected logical core stride of 1

Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 1048
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------

Linking all bonded interactions to atoms

Intra-simulation communication will occur every 5 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

There are: 7212 Atoms
Atom distribution over 30 domains: av 240 stddev 19 min 216 max 269

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 0.00e+00
Initial temperature: 2.24149e-07 K

Started mdrun on rank 0 Tue Nov 22 18:04:00 2016
Step Time
0 0.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
4.85284e+02 1.58065e+03 2.10572e+03 9.12925e+01 -9.53385e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.35322e+03 2.90348e+04 5.95284e+03 -1.33278e+05 2.30515e+03
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-9.13221e+04 6.49192e+01 -9.12572e+04 1.00709e+00 -2.72219e+03
Constr. rmsd
6.03865e-07

DD step 24 load imb.: force 22.9% pme mesh/force 0.146

step 50 Turning on dynamic load balancing, because the performance loss due to load imbalance is 17.4 %.

DD step 999 vol min/aver 0.684 load imb.: force 2.9% pme mesh/force 0.188

Step Time
1000 2.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.10766e+03 3.10293e+03 2.45351e+03 2.53015e+02 -8.63380e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.41183e+03 2.93276e+04 4.16100e+03 -1.23775e+05 4.91487e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.23295e+04 1.92458e+04 -6.30836e+04 2.98561e+02 -8.51444e+01
Constr. rmsd
5.87993e-07

DD step 1999 vol min/aver 0.735 load imb.: force 3.6% pme mesh/force 0.180

Step Time
2000 4.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.15101e+03 3.42103e+03 2.49168e+03 2.49089e+02 -8.47304e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.43224e+03 2.95098e+04 4.96990e+03 -1.26264e+05 4.76299e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.34105e+04 1.92940e+04 -6.41165e+04 2.99308e+02 2.10530e+02
Constr. rmsd
6.05529e-07

DD step 2999 vol min/aver 0.779 load imb.: force 4.4% pme mesh/force 0.173

Step Time
3000 6.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.32946e+03 3.80932e+03 2.47560e+03 2.56091e+02 -8.35148e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.47032e+03 2.93066e+04 5.48726e+03 -1.27714e+05 4.46558e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.39676e+04 1.95036e+04 -6.44640e+04 3.02560e+02 1.53300e+02
Constr. rmsd
6.02272e-07

DD step 3999 vol min/aver 0.780 load imb.: force 4.3% pme mesh/force 0.172

Step Time
4000 8.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.34496e+03 3.98152e+03 2.46465e+03 2.75545e+02 -8.89528e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.44680e+03 2.92855e+04 5.44958e+03 -1.28543e+05 4.23473e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.47605e+04 1.90924e+04 -6.56681e+04 2.96181e+02 7.87885e+01
Constr. rmsd
6.15702e-07

DD step 4999 vol min/aver 0.752 load imb.: force 3.4% pme mesh/force 0.174

Step Time
5000 10.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.51944e+03 4.23382e+03 2.54841e+03 3.58283e+02 -8.50737e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.41869e+03 2.93636e+04 5.52311e+03 -1.28893e+05 4.63342e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.43148e+04 1.93465e+04 -6.49682e+04 3.00123e+02 -2.17962e+02
Constr. rmsd
5.89622e-07

DD step 5999 vol min/aver 0.787 load imb.: force 5.2% pme mesh/force 0.176

Step Time
6000 12.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.39531e+03 4.55564e+03 2.45689e+03 3.03842e+02 -8.72715e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.45696e+03 2.94095e+04 5.40042e+03 -1.29172e+05 4.63184e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.46031e+04 1.93519e+04 -6.52512e+04 3.00206e+02 -1.37238e+02
Constr. rmsd
5.90696e-07

DD step 6999 vol min/aver 0.789 load imb.: force 4.4% pme mesh/force 0.172

Step Time
7000 14.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.56184e+03 4.35477e+03 2.55603e+03 3.36459e+02 -8.52067e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.43060e+03 2.94386e+04 5.90823e+03 -1.30386e+05 4.15196e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.52365e+04 1.90741e+04 -6.61624e+04 2.95896e+02 -2.45629e+02
Constr. rmsd
5.65560e-07

DD step 7999 vol min/aver 0.791 load imb.: force 2.6% pme mesh/force 0.171

Step Time
8000 16.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.57455e+03 4.33795e+03 2.58736e+03 3.92928e+02 -8.47208e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.42187e+03 2.94131e+04 5.11135e+03 -1.30023e+05 3.95257e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.56362e+04 1.91859e+04 -6.64503e+04 2.97631e+02 -6.31145e+02
Constr. rmsd
5.70933e-07

DD step 8999 vol min/aver 0.805 load imb.: force 3.9% pme mesh/force 0.167

Step Time
9000 18.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.61388e+03 4.43826e+03 2.57273e+03 3.54382e+02 -8.18253e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.50830e+03 2.95109e+04 6.18780e+03 -1.30748e+05 4.12635e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.49670e+04 1.90672e+04 -6.58998e+04 2.95789e+02 8.85213e+02
Constr. rmsd
6.10086e-07

DD step 9999 vol min/aver 0.796 load imb.: force 4.1% pme mesh/force 0.166

Step Time
10000 20.00000

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.57776e+03 4.74697e+03 2.55289e+03 3.28043e+02 -8.49742e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.37784e+03 2.92951e+04 5.45122e+03 -1.30367e+05 4.07388e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.54799e+04 1.94117e+04 -6.60682e+04 3.01133e+02 -2.63946e+02
Constr. rmsd
5.71543e-07

Step 10325: Run time exceeded 0.099 hours, will terminate the run
Step Time
10350 20.70000

Writing checkpoint, step 10350 at Tue Nov 22 18:09:58 2016

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.67200e+03 4.57620e+03 2.52384e+03 3.37593e+02 -8.93834e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.37101e+03 2.95503e+04 5.61109e+03 -1.30688e+05 4.04877e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.55351e+04 1.95889e+04 -6.59461e+04 3.03883e+02 -2.83641e+02
Constr. rmsd
0.00000e+00

<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>

Statistics over 10351 steps using 104 frames

Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.41817e+03 3.96515e+03 2.49216e+03 2.87475e+02 -8.56927e+02
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.43030e+03 2.93680e+04 5.43192e+03 -1.28639e+05 4.67129e+02
Potential Kinetic En. Total Energy Temperature Pressure (bar)
-8.46358e+04 1.91040e+04 -6.55318e+04 2.96361e+02 -5.47487e+01
Constr. rmsd
0.00000e+00

Box-X Box-Y Box-Z
8.01772e+00 3.00665e+00 3.00665e+00

Total Virial (kJ/mol)
6.32816e+03 1.00093e+01 -2.47304e+01
9.75284e+00 6.52359e+03 6.38380e+01
-2.49631e+01 6.39218e+01 6.61403e+03

Pressure (bar)
-1.48214e+01 4.92950e+00 1.09843e+01
5.04700e+00 -5.01086e+01 -2.51240e+01
1.10923e+01 -2.51621e+01 -9.93162e+01

M E G A - F L O P S A C C O U N T I N G

NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only

Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 39.716787 39.717 0.0
Pair Search distance check 1443.258900 12989.330 0.3
NxN Ewald Elec. + LJ [F] 39899.963792 2633397.610 63.9
NxN Ewald Elec. + LJ [V&F] 408.976768 43760.514 1.1
NxN Ewald Elec. [F] 20055.105456 1223361.433 29.7
NxN Ewald Elec. [V&F] 205.628032 17272.755 0.4
1,4 nonbonded interactions 57.178924 5146.103 0.1
Calc Weights 223.954236 8062.352 0.2
Spread Q Bspline 4777.690368 9555.381 0.2
Gather F Bspline 4777.690368 28666.142 0.7
3D-FFT 14400.166286 115201.330 2.8
Solve PME 18.631800 1192.435 0.0
Reset In Box 2.992980 8.979 0.0
CG-CoM 3.000192 9.001 0.0
Bonds 11.272239 665.062 0.0
Propers 45.968791 10526.853 0.3
Impropers 4.316367 897.804 0.0
Virial 8.870232 159.664 0.0
Stop-CM 0.757260 7.573 0.0
Calc-Ekin 29.872104 806.547 0.0
Lincs 13.185392 791.124 0.0
Lincs-Mat 75.131400 300.526 0.0
Constraint-V 88.866253 710.930 0.0
Constraint-Vir 7.574221 181.781 0.0
Settle 20.834678 6729.601 0.2
(null) 1.469842 0.000 0.0
-----------------------------------------------------------------------------
Total 4120440.547 100.0
-----------------------------------------------------------------------------

D O M A I N D E C O M P O S I T I O N S T A T I S T I C S

av. #atoms communicated per step for force: 2 x 30819.7
av. #atoms communicated per step for LINCS: 5 x 1374.1

Average load imbalance: 4.3 %
Part of the total run time spent waiting due to load imbalance: 3.8 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %
Average PME mesh/force load: 0.173
Part of the total run time spent waiting due to PP/PME imbalance: 13.5 %

NOTE: 13.5 % performance was lost because the PME ranks
had less work to do than the PP ranks.
You might want to decrease the number of PME ranks
or decrease the cut-off and the grid spacing.

R E A L C Y C L E A N D T I M E A C C O U N T I N G

On 30 MPI ranks doing PP, and
on 6 MPI ranks doing PME

Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 30 1 415 0.611 42.152 0.1
DD comm. load 30 1 414 0.025 1.751 0.0
DD comm. bounds 30 1 413 0.059 4.095 0.0
Send X to PME 30 1 10351 0.489 33.694 0.1
Neighbor search 30 1 415 2.906 200.372 0.7
Comm. coord. 30 1 9936 1.403 96.780 0.3
Force 30 1 10351 334.261 23050.814 77.8
Wait + Comm. F 30 1 10351 13.013 897.411 3.0
PME mesh * 6 1 10351 58.181 802.435 2.7
PME wait for PP * 299.962 4137.108 14.0
Wait + Recv. PME F 30 1 10351 0.051 3.528 0.0
NB X/F buffer ops. 30 1 30223 0.534 36.850 0.1
Write traj. 30 1 53 0.032 2.196 0.0
Update 30 1 10351 0.461 31.794 0.1
Constraints 30 1 10351 3.449 237.867 0.8
Comm. energies 30 1 2071 0.516 35.580 0.1
Rest 0.332 22.929 0.1
-----------------------------------------------------------------------------
Total 358.144 29637.376 100.0
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 6 1 20702 3.364 46.403 0.2
PME spread/gather 6 1 20702 46.059 635.255 2.1
PME 3D-FFT 6 1 20702 2.960 40.825 0.1
PME 3D-FFT Comm. 6 1 20702 1.493 20.593 0.1
PME solve Elec 6 1 10351 4.278 59.006 0.2
-----------------------------------------------------------------------------

Core t (s) Wall t (s) (%)
Time: 12893.176 358.144 3600.0
(ns/day) (hour/ns)
Performance: 4.994 4.806
Finished mdrun on rank 0 Tue Nov 22 18:09:58 2016

Dauer bei beiden Tests 6 min.

Paran

2016-11-22, 19:06:37

Vielen Dank Löschzwerg :smile:
Ich muss deine Log-Files genauer studieren, irgendetwas ist da mächtig schief gelaufen. :uponder:

Ich habe auch bereits einige andere AMD CPUs (im CPU only Modus) vermessen, aber die waren alle samt um Welten besser als dein Xeon E5-2697 :eek:

Loeschzwerg

2016-11-22, 19:17:40

Hab mir fast schon gedacht dass da was nicht stimmen kann. Es kommt ja auch ein Hinweis dass die Einstelluntgen nicht unbedingtz optimal sind.
Vielleicht habe ich aber auch beim kompilieren irgendwo Mist gebaut... Egal wie, wenn du schlauer bist kann ich gerne einen Nachtest machen :)