We present a pipeline to estimate baryonic properties of a galaxy inside a dark matter (DM) halo in DM-only simulations using a machine trained on high-resolution hydrodynamic simulations. As an example, we use the IllustrisTNG hydrodynamic simulation of a |$(75 \, \, h^{-1}\, {\rm Mpc})^3$| volume to train our machine to predict e.g. stellar mass and star formation rate in a galaxy-sized halo based purely on its DM content. An extremely randomized tree (ERT) algorithm is used together with multiple novel improvements we introduce here such as a refined error function in machine training and two-stage learning. Aided by these improvements, our model demonstrates a significantly increased accuracy in predicting baryonic properties compared to prior attempts – in other words, the machine better mimics IllustrisTNG’s galaxy–halo correlation. By applying our machine to the MultiDark-Planck DM-only simulation of a large |$(1 \, \, h^{-1}\, {\rm Gpc})^3$| volume, we then validate the pipeline that rapidly generates a galaxy catalogue from a DM halo catalogue using the correlations the machine found in IllustrisTNG. We also compare our galaxy catalogue with the ones produced by popular semi-analytic models (SAMs). Our so-called machine-assisted semisimulation model (MSSM) is shown to be largely compatible with SAMs, and may become a promising method to transplant the baryon physics of galaxy-scale hydrodynamic calculations on to a larger volume DM-only run. We discuss the benefits that machine-based approaches like this entail, as well as suggestions to raise the scientific potential of such approaches.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)