Articles | Volume 5, issue 2
https://doi.org/10.5194/soil-5-177-2019
https://doi.org/10.5194/soil-5-177-2019
Original research article
 | 
17 Jul 2019
Original research article |  | 17 Jul 2019

Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts

José Padarian and Ignacio Fuentes

Related authors

Additional soil organic carbon storage potential in global croplands
José Padarian, Budiman Minasny, Alex B. McBratney, and Pete Smith
SOIL Discuss., https://doi.org/10.5194/soil-2021-73,https://doi.org/10.5194/soil-2021-73, 2021
Manuscript not accepted for further review
Short summary
Game theory interpretation of digital soil mapping convolutional neural networks
José Padarian, Alex B. McBratney, and Budiman Minasny
SOIL, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020,https://doi.org/10.5194/soil-6-389-2020, 2020
Short summary
A new model for intra- and inter-institutional soil data sharing
José Padarian and Alex B. McBratney
SOIL, 6, 89–94, https://doi.org/10.5194/soil-6-89-2020,https://doi.org/10.5194/soil-6-89-2020, 2020
Short summary
Machine learning and soil sciences: a review aided by machine learning tools
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 6, 35–52, https://doi.org/10.5194/soil-6-35-2020,https://doi.org/10.5194/soil-6-35-2020, 2020
Short summary
Using deep learning for digital soil mapping
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 5, 79–89, https://doi.org/10.5194/soil-5-79-2019,https://doi.org/10.5194/soil-5-79-2019, 2019
Short summary

Related subject area

Soil and methods
Spatial prediction of organic carbon in German agricultural topsoil using machine learning algorithms
Ali Sakhaee, Anika Gebauer, Mareike Ließ, and Axel Don
SOIL, 8, 587–604, https://doi.org/10.5194/soil-8-587-2022,https://doi.org/10.5194/soil-8-587-2022, 2022
Short summary
On the benefits of clustering approaches in digital soil mapping: an application example concerning soil texture regionalization
István Dunkl and Mareike Ließ
SOIL, 8, 541–558, https://doi.org/10.5194/soil-8-541-2022,https://doi.org/10.5194/soil-8-541-2022, 2022
Short summary
An open Soil Structure Library based on X-ray CT data
Ulrich Weller, Lukas Albrecht, Steffen Schlüter, and Hans-Jörg Vogel
SOIL, 8, 507–515, https://doi.org/10.5194/soil-8-507-2022,https://doi.org/10.5194/soil-8-507-2022, 2022
Short summary
Identification of thermal signature and quantification of charcoal in soil using differential scanning calorimetry and benzene polycarboxylic acid (BPCA) markers
Brieuc Hardy, Nils Borchard, and Jens Leifeld
SOIL, 8, 451–466, https://doi.org/10.5194/soil-8-451-2022,https://doi.org/10.5194/soil-8-451-2022, 2022
Short summary
Estimating soil fungal abundance and diversity at a macroecological scale with deep learning spectrotransfer functions
Yuanyuan Yang, Zefang Shen, Andrew Bissett, and Raphael A. Viscarra Rossel
SOIL, 8, 223–235, https://doi.org/10.5194/soil-8-223-2022,https://doi.org/10.5194/soil-8-223-2022, 2022
Short summary

Cited articles

Baroni, M., Bernardi, R., Do, N.-Q., and chieh Shan, C.: Entailment above the word level in distributional semantics, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 23–32, 2012. a
Baroni, M., Dinu, G., and Kruszewski, G.: Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, 238–247, 2014. a
Baxter, W. and ichi Anjyo, K.: Latent doodle space, in: Computer Graphics Forum, Wiley Online Library, Vol. 25, 477–485, 2006. a
Bengio, Y.: Neural net language models, Scholarpedia, 3, 3881, https://doi.org/10.4249/scholarpedia.3881, 2008. a
Download
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to rescue this information and transform it into a numerical form (embeddings). We used 280764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embedding tasks such as analogies, relatedness, and categorisation, and can be used in novel applications.