Commit 2b62c375 authored by Angelo Ziletti's avatar Angelo Ziletti
Browse files

Add the missing data files

parent 47dd6d81
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
5
-420.9337343048892 2.534576 2.43222 -14.1341 -10.9488 -16.013 1.087 0.1323 4.162 15.72523 -3.98613 36.7680730972 39.6946262426 0.621841405
C 0.99813803 -0.00263872 -0.00464602
H 2.09441750 -0.00242373 0.00417336
H 0.63238996 1.03082951 0.00417296
H 0.62561232 -0.52974905 0.88151021
H 0.64010219 -0.50924801 -0.90858051
8
-718.4191786915518 4.332028 4.445 -12.5472 -9.5577 -13.549 1.1186 0.1452 3.684 13.41121 -3.49181 11.7174333146 22.62248556 0.62832842
C 0.99566434 -0.00295079 -0.00645530
C 2.52433599 -0.00704005 0.00062949
H 0.59642533 1.02180902 -0.00238364
H 0.58817563 -0.51880627 0.87523331
H 0.59641749 -0.50854984 -0.89780318
H 2.92359554 0.50116660 0.89048719
H 2.93182660 0.50621547 -0.88257380
H 2.92355907 -1.03181414 -0.00043407
6
-570.0248559034032 4.173282 3.92308 -10.4383 -7.8768 -10.16 1.9456 -0.062 1.231 9.75211 -0.82042 7.1372664886 22.9406708318 0.709059588
C 0.98946692 0.00007550 0.00000000
C 2.32461012 -0.00013585 0.00000000
H 0.41663940 0.92974933 0.00000000
H 0.41634287 -0.92941467 0.00000000
H 2.89773268 0.92935495 0.00000000
H 2.89743801 -0.92980927 0.00000000
4
-410.2861527671238 3.512394 3.32175 -11.1629 -8.4021 -10.706 1.1958 0.4143 2.145 10.46577 -1.89847 4.6241074644 15.973798303 2.22847974
C 0.98972410 0.00000000 0.00000000
C 2.20043588 0.00000000 0.00000000
H -0.08202857 0.00000000 0.00000000
H 3.27218859 0.00000000 0.00000000
9
-868.8554219717826 5.48016 5.78632 -10.8132 -8.3534 -12.365 1.2002 0.269 3.825 11.96491 -3.57487 10.3116552494 17.9005159458 0.556173673
C -0.02685201 0.87078057 -0.05692871
C -0.73928196 -0.46068850 -0.05716072
C 0.76666931 -0.41017260 0.04288779
H 0.01672181 1.42791936 -0.99323318
H -0.10696079 1.49356439 0.83453787
H -1.17833564 -0.80568159 -0.99374138
H -1.30207225 -0.73988932 0.83406126
H 1.22416472 -0.65503779 1.00190446
H 1.34779680 -0.72084452 -0.82587738
This diff is collapsed.
Data set dsgdb7njp
==================
14 electronic properties for 7k small organic molecules.
Please cite this publication if you use this data set:
* Gr\'egoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia,
Katja Hansen, Alexandre Tkatchenko, Klaus-Robert M"uller, O. Anatole von Lilienfeld:
Machine learning of molecular electronic properties in chemical compound space,
New Journal of Physics, 15(9): 095003, IOP Publishing, 2013.
DOI: 10.1088/1367-2630/15/9/095003
Related publications:
* Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert M"uller, O. Anatole von
Lilienfeld: Fast and Accurate Modeling of Molecular Atomization Energies with
Machine Learning, Physical Review Letters, 108(5): 058301, 2012.
DOI: 10.1103/PhysRevLett.108.058301
* Gr\'egoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska
Biegler, Andreas Ziehe, Alexandre Tkatchenko, O. Anatole von Lilienfeld,
Klaus-Robert M"uller: Learning Invariant Representations of Molecules for
Atomization Energy Prediction, Advances in Neural Information Processing
Systems 25 (NIPS 2012), Lake Tahoe, Nevada, USA, December 3-6, 2012.
A version of this data set with pre-calculated Coulomb matrices is available at
http://quantum-machine.org (last accessed 2013-04-08).
Files
-----
dsgdb7njp.xyz - Molecules and properties in XYZ format.
dsgdb7njp_cvsplits.txt - Indices of 5-fold stratified cross-validation splits.
dsgdb7njp_subset1k.txt - Indices of stratified subsets of 1000 molecules.
readme.txt - Documentation.
Molecules
---------
A subset of 7211 small organic molecules from the GDB-13 database [1]. It
contains all molecules with up to 7 non-hydrogen atoms and elements H,C,N,O,S,Cl.
Molecular geometries were generated using the universal force field [2] as
implemented in OpenBabel [3] and subsequently relaxed using the PBE
approximation [4] to Kohn-Sham density functional theory [5] as implemented in
FHI-aims [6]. Coordinates are in Angstrom; to convert to atomic units, multiply
by 100/52.917720859.
Properties
----------
I. Identifier Unit Description
-- ---------- ---------- -----------
01 ae_pbe0 kcal/mol Atomization energy (DFT/PBE0)
02 p_pbe0 Angstrom^3 Polarizability (DFT/PBE0)
03 p_scs Angstrom^3 Polarizability (self-consistent screening)
04 homo_gw eV Highest occupied molecular orbital (GW)
05 homo_pbe0 eV Highest occupied molecular orbital (DFT/PBE0)
06 homo_zindo eV Highest occupied molecular orbital (ZINDO/s)
07 lumo_gw eV Lowest unoccupied molecular orbital (GW)
08 lumo_pbe0 eV Lowest unoccupied molecular orbital (DFT/PBE0)
09 lumo_zindo eV Lowest unoccupied molecular orbital (ZINDO/s)
10 ip_zindo eV Ionization potential (ZINDO/s)
11 ea_zindo eV Electron affinity (ZINDO/s)
12 e1_zindo eV First excitation energy (ZINDO)
13 emax_zindo eV Maximal absorption intensity (ZINDO)
14 imax_zindo arbitrary Excitation energy at maximal absorption (ZINDO)
I. = Index, DFT/PBE0 = density functional theory with PBE0 functional,
GW = Hedin's GW approximation, ZINDO = Zerner's intermediate neglect of
differential overlap. Divide ae_pbe0 by 23.045108 to convert to eV.
Cross-validation splits
-----------------------
Indices (starting from 1) for 5-fold stratified cross-validation are provided
for each property. Stratification is by property so that each fold covers
the whole property range.
1k subset
---------
Indices (starting from 1) for a stratified subset of 1000 molecules are
provided for each property. Stratification is by property so that the subset
covers the whole property range.
References
----------
[1] Lorenz C. Blum, Jean-Louis Reymond: 970 Million Druglike Small Molecules
for Virtual Screening in the Chemical Universe Database GDB-13, Journal of
the American Chemical Society 131(25): 8732-8733, 2009. DOI: 10.1021/ja902302h
[2] Anthony K. Rapp\'e, Carla J. Casewit, K. S. Colwell, William A. Goddard III,
W. Mason Skiff: UFF, a full periodic table force field for molecular
mechanics and molecular dynamics simulations, Journal of the American
Chemical Society 114(25): 10024-10035, 1992. DOI: 10.1021/ja00051a040
[3] Rajarshi Guha, Michael T. Howard, Geoffrey R. Hutchison, Peter Murray-Rust,
Henry Rzepa, Christoph Steinbeck, J"org Wegner, Egon L. Willighagen: The
Blue Obelisk - Interoperability in Chemical Informatics, Journal of
Chemical Information and Modeling 46(3): 991-998, 2006. DOI: 10.1021/ci050400b
[4] John P. Perdew, Kieron Burke, Matthias Ernzerhof: Generalized Gradient
Approximation Made Simple, Physical Review Letters 77(18): 3865-3868, 1996.
DOI: 10.1103/PhysRevLett.77.3865
[5] Walter Kohn, Lu J. Sham: Self-consistent equations including exchange and
correlation effects, Physical Review 140(4A): A1133-A1138, 1965.
DOI: 10.1103/PhysRev.140.A1133
[6] Volker Blum, Ralf Gehrke, Felix Hanke, Paula Havu, Ville Havu, Xinguo Ren,
Karsten Reuter, Matthias Scheffler: Ab initio molecular simulations with
numeric atom-centered orbitals, Computer Physics Communications 180(11):
2175-2196, 2009. DOI: 10.1016/j.cpc.2009.06.022
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment