<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.2 20190208//EN" "JATS-archivearticle1-mathml3.dtd"> <article xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn publication-format="electronic" pub-type="epub">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">77220</article-id><article-id pub-id-type="doi">10.7554/eLife.77220</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research Advance</subject></subj-group><subj-group subj-group-type="heading"><subject>Neuroscience</subject></subj-group></article-categories><title-group><article-title>Flexible and efficient simulation-based inference for models of decision-making</article-title></title-group><contrib-group><contrib contrib-type="author" corresp="yes" id="author-267576"><name><surname>Boelts</surname><given-names>Jan</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-4979-7092</contrib-id><email>jan.boelts@uni-tuebingen.de</email><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref><xref ref-type="other" rid="fund2"/><xref ref-type="other" rid="fund3"/><xref ref-type="other" rid="fund8"/><xref ref-type="other" rid="fund6"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-267744"><name><surname>Lueckmann</surname><given-names>Jan-Matthis</given-names></name><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="other" rid="fund1"/><xref ref-type="other" rid="fund3"/><xref ref-type="other" rid="fund7"/><xref ref-type="other" rid="fund4"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf2"/></contrib><contrib contrib-type="author" id="author-267578"><name><surname>Gao</surname><given-names>Richard</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0001-5916-6433</contrib-id><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="other" rid="fund3"/><xref ref-type="other" rid="fund5"/><xref ref-type="fn" rid="con3"/><xref ref-type="fn" rid="conf2"/></contrib><contrib contrib-type="author" id="author-267579"><name><surname>Macke</surname><given-names>Jakob H</given-names></name><contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0001-5154-8912</contrib-id><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff3">3</xref><xref ref-type="other" rid="fund1"/><xref ref-type="other" rid="fund2"/><xref ref-type="other" rid="fund3"/><xref ref-type="other" rid="fund7"/><xref ref-type="other" rid="fund8"/><xref ref-type="other" rid="fund4"/><xref ref-type="other" rid="fund6"/><xref ref-type="fn" rid="con4"/><xref ref-type="fn" rid="conf1"/></contrib><aff id="aff1"><label>1</label><institution-wrap><institution-id institution-id-type="ror">https://ror.org/03a1kwz48</institution-id><institution>Machine Learning in Science, Excellence Cluster Machine Learning, University of Tübingen</institution></institution-wrap><addr-line><named-content content-type="city">Tübingen</named-content></addr-line><country>Germany</country></aff><aff id="aff2"><label>2</label><institution-wrap><institution-id institution-id-type="ror">https://ror.org/02kkvpp62</institution-id><institution>Technical University of Munich</institution></institution-wrap><addr-line><named-content content-type="city">Munich</named-content></addr-line><country>Germany</country></aff><aff id="aff3"><label>3</label><institution-wrap><institution-id institution-id-type="ror">https://ror.org/04fq9j139</institution-id><institution>Max Planck Institute for Intelligent Systems Tübingen</institution></institution-wrap><addr-line><named-content content-type="city">Tübingen</named-content></addr-line><country>Germany</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Wyart</surname><given-names>Valentin</given-names></name><role>Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/013cjyk83</institution-id><institution>École normale supérieure, PSL University, INSERM</institution></institution-wrap><country>France</country></aff></contrib><contrib contrib-type="senior_editor"><name><surname>Behrens</surname><given-names>Timothy E</given-names></name><role>Senior Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/052gg0110</institution-id><institution>University of Oxford</institution></institution-wrap><country>United Kingdom</country></aff></contrib></contrib-group><pub-date publication-format="electronic" date-type="publication"><day>27</day><month>07</month><year>2022</year></pub-date><pub-date pub-type="collection"><year>2022</year></pub-date><volume>11</volume><elocation-id>e77220</elocation-id><history><date date-type="received" iso-8601-date="2022-01-25"><day>25</day><month>01</month><year>2022</year></date><date date-type="accepted" iso-8601-date="2022-07-26"><day>26</day><month>07</month><year>2022</year></date></history><pub-history><event><event-desc>This manuscript was published as a preprint at .</event-desc><date date-type="preprint" iso-8601-date="2021-12-23"><day>23</day><month>12</month><year>2021</year></date><self-uri content-type="preprint" xlink:href="https://doi.org/10.1101/2021.12.22.473472"/></event></pub-history><permissions><copyright-statement>© 2022, Boelts et al</copyright-statement><copyright-year>2022</copyright-year><copyright-holder>Boelts et al</copyright-holder><ali:free_to_read/><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><ali:license_ref>http://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife-77220-v2.pdf"/><self-uri content-type="figures-pdf" xlink:href="elife-77220-figures-v2.pdf"/><related-article related-article-type="article-reference" ext-link-type="doi" xlink:href="10.7554/eLife.65074" id="ra1"/><abstract><p>Inferring parameters of computational models that capture experimental data are a central task in cognitive neuroscience. Bayesian statistical inference methods usually require the ability to evaluate the likelihood of the model—however, for many models of interest in cognitive neuroscience, the associated likelihoods cannot be computed efficiently. Simulation-based inference (SBI) offers a solution to this problem by only requiring access to simulations produced by the model. Previously, Fengler et al. introduced likelihood approximation networks (LANs, Fengler et al., 2021) which make it possible to apply SBI to models of decision-making, but require billions of simulations for training. Here, we provide a new SBI method that is substantially more simulation efficient. Our approach, mixed neural likelihood estimation (MNLE), trains neural density estimators on model simulations to emulate the simulator, and is designed to capture both the continuous (e.g., reaction times) and discrete (choices) data of decision-making models. The likelihoods of the emulator can then be used to perform Bayesian parameter inference on experimental data using standard approximate inference methods like Markov Chain Monte Carlo sampling. We demonstrate MNLE on two variants of the drift-diffusion model and show that it is substantially more efficient than LANs: MNLE achieves similar likelihood accuracy with six orders of magnitude fewer training simulations, and is significantly more accurate than LANs when both are trained with the same budget. Our approach enables researchers to perform SBI on custom-tailored models of decision-making, leading to fast iteration of model design for scientific discovery.</p></abstract><kwd-group kwd-group-type="author-keywords"><kwd>computational modeling</kwd><kwd>decision-making</kwd><kwd>Bayesian inference</kwd><kwd>simulation-based inference</kwd><kwd>machine learning</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>None</kwd></kwd-group><funding-group><award-group id="fund1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001659</institution-id><institution>Deutsche Forschungsgemeinschaft</institution></institution-wrap></funding-source><award-id>SFB 1233</award-id><principal-award-recipient><name><surname>Lueckmann</surname><given-names>Jan-Matthis</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund2"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001659</institution-id><institution>Deutsche Forschungsgemeinschaft</institution></institution-wrap></funding-source><award-id>SPP 2041</award-id><principal-award-recipient><name><surname>Boelts</surname><given-names>Jan</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund3"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001659</institution-id><institution>Deutsche Forschungsgemeinschaft</institution></institution-wrap></funding-source><award-id>Germany's Excellence Strategy MLCoE</award-id><principal-award-recipient><name><surname>Boelts</surname><given-names>Jan</given-names></name><name><surname>Lueckmann</surname><given-names>Jan-Matthis</given-names></name><name><surname>Gao</surname><given-names>Richard</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund4"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100002347</institution-id><institution>Bundesministerium für Bildung und Forschung</institution></institution-wrap></funding-source><award-id>ADIMEM</award-id><principal-award-recipient><name><surname>Lueckmann</surname><given-names>Jan-Matthis</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund5"><funding-source><institution-wrap><institution>HORIZON EUROPE Marie Sklodowska-Curie Actions</institution></institution-wrap></funding-source><award-id>101030918</award-id><principal-award-recipient><name><surname>Gao</surname><given-names>Richard</given-names></name></principal-award-recipient></award-group><award-group id="fund6"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100002347</institution-id><institution>Bundesministerium für Bildung und Forschung</institution></institution-wrap></funding-source><award-id>Tübingen AI Center</award-id><principal-award-recipient><name><surname>Boelts</surname><given-names>Jan</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund7"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100002347</institution-id><institution>Bundesministerium für Bildung und Forschung</institution></institution-wrap></funding-source><award-id>FKZ 01IS18052 A-D</award-id><principal-award-recipient><name><surname>Lueckmann</surname><given-names>Jan-Matthis</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><award-group id="fund8"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100002347</institution-id><institution>Bundesministerium für Bildung und Forschung</institution></institution-wrap></funding-source><award-id>KZ 01IS18039A</award-id><principal-award-recipient><name><surname>Boelts</surname><given-names>Jan</given-names></name><name><surname>Macke</surname><given-names>Jakob H</given-names></name></principal-award-recipient></award-group><funding-statement>The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>A new machine learning method makes it possible to efficiently identify the parameters of cognitive models using Bayesian inference, even when only model simulations are available.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Computational modeling is an essential part of the scientific process in cognitive neuroscience: Models are developed from prior knowledge and hypotheses, and compared to experimentally observed phenomena (<xref ref-type="bibr" rid="bib8">Churchland and Sejnowski, 1988</xref>; <xref ref-type="bibr" rid="bib37">McClelland, 2009</xref>). Computational models usually have free parameters which need to be tuned to find those models that capture experimental data. This is often approached by searching for single best-fitting parameters using grid search or optimization methods. While this point-wise approach has been used successfully (<xref ref-type="bibr" rid="bib32">Lee et al., 2016</xref>; <xref ref-type="bibr" rid="bib48">Patil et al., 2016</xref>) it can be scientifically more informative to perform Bayesian inference over the model parameters: Bayesian inference takes into account prior knowledge, reveals <italic>all</italic> the parameters consistent with observed data, and thus can be used for quantifying uncertainty, hypothesis testing, and model selection (<xref ref-type="bibr" rid="bib29">Lee, 2008</xref>; <xref ref-type="bibr" rid="bib58">Shiffrin et al., 2008</xref>; <xref ref-type="bibr" rid="bib31">Lee and Wagenmakers, 2014</xref>; <xref ref-type="bibr" rid="bib57">Schad et al., 2021</xref>). Yet, as the complexity of models used in cognitive neuroscience increases, Bayesian inference becomes challenging for two reasons. First, for many commonly used models, computational evaluation of likelihoods is challenging because often no analytical form is available. Numerical approximations of the likelihood are typically computationally expensive, rendering standard approximate inference methods like Markov Chain Monte Carlo (MCMC) inapplicable. Second, models and experimental paradigms in cognitive neuroscience often induce scenarios in which inference is repeated for varying numbers of experimental trials and changing hierarchical dependencies between model parameters (<xref ref-type="bibr" rid="bib30">Lee, 2011</xref>). As such, fitting computational models with arbitrary hierarchical structures to experimental data often still requires idiosyncratic and complex inference algorithms.</p><p>Approximate Bayesian computation (ABC, <xref ref-type="bibr" rid="bib60">Sisson, 2018</xref>) offers a solution to the first challenge by enabling Bayesian inference based on comparing simulated with experimental data, without the need to evaluate an explicit likelihood function. Accordingly, various ABC methods have been applied to and developed for models in cognitive neuroscience and related fields (<xref ref-type="bibr" rid="bib64">Turner and Van Zandt, 2012</xref>; <xref ref-type="bibr" rid="bib66">Turner and Van Zandt, 2018</xref>; <xref ref-type="bibr" rid="bib41">Palestro et al., 2009</xref>; <xref ref-type="bibr" rid="bib27">Kangasrääsiö et al., 2019</xref>). However, ABC methods are limited regarding the second challenge because they become inefficient as the number of model parameters increases (<xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>) and require generating new simulations whenever the observed data or parameter dependencies change.</p><p>More recent approaches from the field simulation-based inference (SBI, <xref ref-type="bibr" rid="bib10">Cranmer et al., 2020</xref>) have the potential to overcome these limitations by using machine learning algorithms such as neural networks. Recently, <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref> presented an SBI algorithm for a specific problem in cognitive neuroscience—inference for drift-diffusion models (DDMs). They introduced a new approach, called likelihood approximation networks (LANs), which uses neural networks to predict log-likelihoods from data and parameters. The predicted likelihoods can subsequently be used to generate posterior samples using MCMC methods. LANs are trained in a three-step procedure. First, a set of <inline-formula><mml:math id="inf1"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters is generated and for each of the <inline-formula><mml:math id="inf2"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters the model is simulated <inline-formula><mml:math id="inf3"><mml:mi>M</mml:mi></mml:math></inline-formula> times. Second, for each of the <inline-formula><mml:math id="inf4"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters, empirical likelihood targets are estimated from the <inline-formula><mml:math id="inf5"><mml:mi>M</mml:mi></mml:math></inline-formula> model simulations using kernel density estimation (KDE) or empirical histograms. Third, a training dataset consisting of parameters, data points, and empirical likelihood targets is constructed by augmenting the initial set of <inline-formula><mml:math id="inf6"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters by a factor of 1000: for each parameter, 1000 data points and empirical likelihood targets are generated from the learned KDE. Finally, supervised learning is used to train a neural network to predict log-likelihoods, by minimizing a loss function (the Huber loss) between the network-predicted log-likelihoods and the (log of) the empirically estimated likelihoods. Overall, LANs require a large number of model simulations such that the histogram probability of each possible observed data and for each possible combination of input parameters, can be accurately estimated—<inline-formula><mml:math id="inf7"><mml:mrow><mml:mi>N</mml:mi><mml:mo>⋅</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> model simulations, for example, <inline-formula><mml:math id="inf8"><mml:mrow><mml:mrow><mml:mn>1.5</mml:mn><mml:mtext>×</mml:mtext><mml:msup><mml:mn>10</mml:mn><mml:mn>6</mml:mn></mml:msup></mml:mrow><mml:mtext/><mml:mi/></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:mrow><mml:mi/><mml:mo>⁢</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mn>5</mml:mn></mml:msup></mml:mrow><mml:mtext/><mml:mi/></mml:mrow></mml:math></inline-formula> (150 billion) for the examples used in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>. The extremely high number of model simulations will make it infeasible for most users to run this training themselves, so that there would need to be a repository from which users can download pretrained LANs. This restricts the application of LANs to a small set of canonical models like DDMs, and prohibits customization and iteration of models by users. In addition, the high simulation requirement limits this approach to models whose parameters and observations are sufficiently low dimensional for histograms to be sampled densely.</p><p>To overcome these limitations, we propose an alternative approach called mixed neural likelihood estimation (MNLE). MNLE builds on recent advances in probabilistic machine learning, and in particular on the framework of neural likelihood estimation (<xref ref-type="bibr" rid="bib46">Papamakarios et al., 2019b</xref>; <xref ref-type="bibr" rid="bib35">Lueckmann et al., 2019</xref>) but is designed to specifically capture the mixed data types arising in models of decision-making, for example, discrete choices and continuous reaction times. Neural likelihood estimation has its origin in classical synthetic likelihood (SL) approaches (<xref ref-type="bibr" rid="bib75">Wood, 2010</xref>; <xref ref-type="bibr" rid="bib50">Price et al., 2018</xref>). Classical SL approaches assume the likelihood of the simulation-based model to be Gaussian (so that its moments can be estimated from model simulations) and then use MCMC methods for inference. This approach and various extensions of it have been widely used (<xref ref-type="bibr" rid="bib50">Price et al., 2018</xref>; <xref ref-type="bibr" rid="bib40">Ong et al., 2009</xref>; <xref ref-type="bibr" rid="bib1">An et al., 2019</xref>; <xref ref-type="bibr" rid="bib51">Priddle et al., 2022</xref>)—but inherently they need multiple model simulations for each parameter in the MCMC chain to estimate the associated likelihood.</p><p><italic>Neural</italic> likelihood approaches instead perform <italic>conditional</italic> density estimation, that is, they train a neural network to predict the parameters of the approximate likelihood conditioned on the model parameters (e.g., <xref ref-type="bibr" rid="bib46">Papamakarios et al., 2019b</xref>; <xref ref-type="bibr" rid="bib35">Lueckmann et al., 2019</xref>). By using a conditional density estimator, it is possible to exploit continuity across different model parameters, rather than having to learn a separate density for each individual parameter as in classical SL. Recent advances in conditional density estimation (such as normalizing flows, <xref ref-type="bibr" rid="bib45">Papamakarios et al., 2019a</xref>) further allow lifting the parametric assumptions of classical SL methods and learning flexible conditional density estimators which are able to model a wide range of densities, as well as highly nonlinear dependencies on the conditioning variable. In addition, neural likelihood estimators yield estimates of the probability density which are guaranteed to be non-negative and normalized, and which can be both sampled and evaluated, acting as a probabilistic emulator of the simulator (<xref ref-type="bibr" rid="bib35">Lueckmann et al., 2019</xref>).</p><p>Our approach, MNLE, uses neural likelihood estimation to learn an emulator of the simulator. The training phase is a simple two-step procedure: first, a training dataset of <inline-formula><mml:math id="inf9"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters <inline-formula><mml:math id="inf10"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula> is sampled from a proposal distribution and corresponding model simulations <inline-formula><mml:math id="inf11"><mml:mi mathvariant="bold">x</mml:mi></mml:math></inline-formula> are generated. Second, the <inline-formula><mml:math id="inf12"><mml:mi>N</mml:mi></mml:math></inline-formula> parameter–data pairs <inline-formula><mml:math id="inf13"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold">x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> are directly used to train a conditional neural likelihood estimator to estimate <inline-formula><mml:math id="inf14"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. Like for LANs, the proposal distribution for the training data can be <italic>any</italic> distribution over <inline-formula><mml:math id="inf15"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula>, and should be chosen to cover all parameter values one expects to encounter in empirical data. Thus, the prior distribution used for Bayesian inference constitutes a useful choice, but in principle any distribution that contains the support of the prior can be used. To account for mixed data types, we learn the likelihood estimator as a mixed model composed of one neural density estimator for categorical data and one for continuous data, conditioned on the categorical data. This separation allows us to choose the appropriate neural density estimator for each data type, for example, a Bernoulli model for the categorical data and a normalizing flow (<xref ref-type="bibr" rid="bib45">Papamakarios et al., 2019a</xref>) for the continuous data. The resulting joint density estimator gives access to the likelihood, which enables inference via MCMC methods. See <xref ref-type="fig" rid="fig1">Figure 1</xref> for an illustration of our approach, and Methods and materials for details.</p><fig id="fig1" position="float"><label>Figure 1.</label><caption><title>Training a neural density estimator on simulated data to perform parameter inference.</title><p>Our goal is to perform Bayesian inference on models of decision-making for which likelihoods cannot be evaluated (here a drift-diffusion model for illustration, left). We train a neural density estimation network on synthetic data generated by the model, to provide access to (estimated) likelihoods. Our neural density estimators are designed to account for the mixed data types of decision-making models (e.g., discrete valued choices and continuous valued reaction times, middle). The estimated likelihoods can then be used for inference with standard Markov Chain Monte Carlo (MCMC) methods, that is, to obtain samples from the posterior over the parameters of the simulator given experimental data (right). Once trained, our method can be applied to flexible inference scenarios like varying number of trials or hierarchical inference without having to retrain the density estimator.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig1.jpg"/></fig><p>Both LANs and MNLEs allow for flexible inference scenarios common in cognitive neuroscience, for example, varying number of trials with same underlying experimental conditions or hierarchical inference, and need to be trained only once. However, there is a key difference between the two approaches. LANs use feed-forward neural networks to perform regression from model parameters to empirical likelihood targets obtained from KDE. MNLE instead learns the likelihood directly by performing conditional density estimation on the simulated data without requiring likelihood targets. This makes MNLE by design more simulation efficient than LANs—we demonstrate empirically that it can learn likelihood estimators which are as good or better than those reported in the LAN paper, but using a factor of 1,000,000 fewer simulations (<xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>). When using the same simulation budget for both approaches, MNLE substantially outperforms LAN across several performance metrics. Moreover, MNLE results in a density estimator that is guaranteed to correspond to a valid probability distribution and can also act as an emulator that can generate synthetic data without running the simulator. The simulation efficiency of MNLEs allows users to explore and iterate on their own models without generating a massive training dataset, rather than restricting their investigations to canonical models for which pretrained networks have been provided by a central resource. To facilitate this process, we implemented our method as an extension to an open-source toolbox for SBI methods (<xref ref-type="bibr" rid="bib62">Tejero-Cantero et al., 2020</xref>), and provide detailed documentation and tutorials.</p></sec><sec id="s2" sec-type="results"><title>Results</title><sec id="s2-1"><title>Evaluating the performance of MNLE on the DDM</title><p>We first demonstrate the efficiency and performance of MLNEs on a classical model of decision-making, the DDM (<xref ref-type="bibr" rid="bib55">Ratcliff and McKoon, 2008</xref>). The DDM is an influential phenomenological model of a two-alternative perceptual decision-making task. It simulates the evolution of an internal decision variable that integrates sensory evidence until one of two decision boundaries is reached and a choice is made (<xref ref-type="fig" rid="fig1">Figure 1</xref>, left). The decision variable is modeled with a stochastic differential equation which, in the ‘simple’ DDM version (as used in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>), has four parameters: the drift rate <inline-formula><mml:math id="inf16"><mml:mi>v</mml:mi></mml:math></inline-formula>, boundary separation <inline-formula><mml:math id="inf17"><mml:mi>a</mml:mi></mml:math></inline-formula>, the starting point <inline-formula><mml:math id="inf18"><mml:mi>w</mml:mi></mml:math></inline-formula> of the decision variable, and the non-decision time <italic>τ</italic>. Given these four parameters <inline-formula><mml:math id="inf19"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>τ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, a single simulation of the DDM returns data <inline-formula><mml:math id="inf20"><mml:mi mathvariant="bold">x</mml:mi></mml:math></inline-formula> containing a choice <inline-formula><mml:math id="inf21"><mml:mrow><mml:mi>c</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and the corresponding reaction time in seconds <inline-formula><mml:math id="inf22"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>∈</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>τ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>.</p></sec><sec id="s2-2"><title>MNLE learns accurate likelihoods with a fraction of the simulation budget</title><p>The simple version of the DDM is the ideal candidate for comparing the performance of different inference methods because the likelihood of an observation given the parameters, <inline-formula><mml:math id="inf23"><mml:mrow><mml:mi>L</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, <italic>can</italic> be calculated analytically (<xref ref-type="bibr" rid="bib38">Navarro and Fuss, 2009</xref>, in contrast to more complicated versions of the DDM, e.g., <xref ref-type="bibr" rid="bib54">Ratcliff and Rouder, 1998</xref>; <xref ref-type="bibr" rid="bib67">Usher and McClelland, 2001</xref>; <xref ref-type="bibr" rid="bib56">Reynolds and Rhodes, 2009</xref>). This enabled us to evaluate MNLE’s performance with respect to the analytical likelihoods and the corresponding inferred posteriors of the DDM, and to compare against that of LANs on a range of simulation budgets. For MNLE, we used a budget of 10<sup>5</sup> simulations (henceforth referred to as MNLE<sup>5</sup>), for LANs we used budgets of 10<sup>5</sup> and 10<sup>8</sup> simulations (LAN<sup>5</sup> and LAN<sup>8</sup>, respectively, trained by us) and the pretrained version based on 10<sup>11</sup> simulations (LAN<sup>11</sup>) provided by <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>.</p><p>First, we evaluated the quality of likelihood approximations of MNLE<sup>5</sup>, and compared it to that of LAN<sup>{5,8,11}</sup>. Both MNLEs and LANs were in principle able to accurately approximate the likelihoods for both decisions and a wide range of reaction times (see <xref ref-type="fig" rid="fig2">Figure 2a</xref> for an example, and Details of the numerical comparison). However, LANs require a much larger simulation budget than MNLE to achieve accurate likelihood approximations, that is, LANs trained with 10<sup>5</sup> or 10<sup>8</sup> simulations show visible deviations, both in the linear and in log-domain (<xref ref-type="fig" rid="fig2">Figure 2a</xref>, lines for LAN<sup>5</sup> and LAN<sup>8</sup>).</p><fig-group><fig id="fig2" position="float"><label>Figure 2.</label><caption><title>Mixed neural likelihood estimation (MNLE) estimates accurate likelihoods for the drift-diffusion model (DDM).</title><p>The classical DDM simulates reaction times and choices of a two-alternative decision task and has an analytical likelihood which can be used for comparing the likelihood approximations of MNLE and likelihood approximation network (LAN). We compared MNLE trained with a budget of 10<sup>5</sup> simulations (green, MNLE<sup>5</sup>) to LAN trained with budgets of 10<sup>5</sup>, 10<sup>8</sup>, and 10<sup>11</sup> simulations (shades of orange, LAN<sup>{5,8,11}</sup>, respectively). (<bold>a</bold>) Example likelihood for a fixed parameter <inline-formula><mml:math id="inf24"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula> over a range of reaction times (reaction times for down- and up-choices shown toward the left and right, respectively). Shown on a linear scale (top panel) and a logarithmic scale (bottom panel). (<bold>b</bold>) Huber loss between analytical and estimated likelihoods calculated for a fixed simulated data point over 1000 test parameters sampled from the prior, averaged over 100 data points (lower is better). Bar plot error bars show standard error of the mean. (<bold>c</bold>) Same as in (<bold>b</bold>), but using mean-squared error (MSE) over likelihoods (lower is better). (<bold>d</bold>) Huber loss on the log-likelihoods (LAN’s training loss). (<bold>e</bold>) MSE on the log-likelihoods.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig2.jpg"/></fig><fig id="fig2s1" position="float" specific-use="child-fig"><label>Figure 2—figure supplement 1.</label><caption><title>Comparison of simulated drift-diffusion model (DDM) data and synthetic data sampled from the mixed neural likelihood estimation (MNLE) emulator.</title><p>Histograms of reaction times from 1000 independent and identically distributed (i.i.d.) trials generated from three different parameters sampled from the prior (panels <bold>a–c</bold>) using the original DDM simulator (purple) and the emulator obtained from MNLE (green). ‘Down’ choices are shown to the left of zero and ‘up’ choices to the right of zero.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig2-figsupp1.jpg"/></fig></fig-group><p>To quantify the quality of likelihood approximation, we calculated the Huber loss and the mean-squared error (MSE) between the true and approximated likelihoods (<xref ref-type="fig" rid="fig2">Figure 2b, c</xref>), as well as between the <italic>log</italic>-likelihoods (<xref ref-type="fig" rid="fig2">Figure 2d, e</xref>). The metrics were calculated as averages over (log-)likelihoods of a fixed observation given 1000 parameters sampled from the prior, repeated for 100 observations simulated from the DDM. For metrics calculated on the untransformed likelihoods (<xref ref-type="fig" rid="fig2">Figure 2b, c</xref>), we found that MNLE<sup>5</sup> was more accurate than LAN<sup>{5,8,11}</sup> on all simulation budgets, showing smaller Huber loss than LAN<sup>{5,8,11}</sup> in 99, 81, and 66 out of 100 comparisons, and smaller MSE than LAN<sup>{5,8,11}</sup> on 98, 81, and 66 out of 100 comparisons, respectively. Similarly, for the MSE calculated on the log-likelihoods (<xref ref-type="fig" rid="fig2">Figure 2e</xref>), MNLE<sup>5</sup> achieved smaller MSE than LAN<sup>{5,8,11}</sup> on 100, 100, and 75 out of 100 comparisons, respectively. For the Huber loss calculated on the log-likelihoods (<xref ref-type="fig" rid="fig2">Figure 2d</xref>), we found that MNLE<sup>5</sup> was more accurate than LAN<sup>5</sup> and LAN<sup>8</sup>, but slightly less accurate than LAN<sup>11</sup>, showing smaller Huber loss than LAN<sup>{5,8}</sup> in all 100 comparisons, and larger Huber loss than LAN<sup>11</sup> in 62 out of 100 comparisons. All the above pairwise comparisons were significant under the binomial test (p < 0.01), but note that these are simulated data and therefore the p value can be arbitrarily inflated by increasing the number of comparisons. We also note that the Huber loss on the log-likelihoods is the loss which is directly optimized by LANs, and thus this comparison should in theory favor LANs over alternative approaches. Furthermore, the MNLE<sup>5</sup> results shown here represent averages over 10 random neural network initializations (five of which achieved smaller Huber loss than LAN<sup>11</sup>), whereas the LAN<sup>11</sup> results are based on a single pretrained network. Finally, we also investigated MNLE’s property to act as an emulator of the simulator and found the synthetic reaction times and choices generated from the MNLE emulator to match corresponding data simulated from the DDM accurately (see <xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref> and Appendix 1).</p><p>When using the learned likelihood estimators for inference with MCMC methods, their evaluation speed can also be important because MCMC often requires thousands of likelihood evaluations. We found that evaluating MNLE for a batch of 100 trials and 10 model parameters (as used during MCMC) took 4.14± (mean over 100 repetitions ± standard error of the mean), compared to 1.02± for LANs, that is, MNLE incurred a larger computational foot-print at evaluation time. Note that these timings are based on an improved implementation of LANs compared to the one originally presented in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>, and evaluation times can depend on the implementation, compute infrastructure and parameter settings (see Details of the numerical comparison and Discussion). In summary, we found that MNLE trained with 10<sup>5</sup> simulations performed substantially better than LANs trained with 10<sup>5</sup> or 10<sup>8</sup> simulations, and similarly well or better than LANs trained with 10<sup>11</sup> simulations, on all likelihood approximation accuracy metrics.</p></sec><sec id="s2-3"><title>MNLE enables accurate flexible posterior inference with MCMC</title><p>In the previous section, we showed that MNLE requires substantially fewer training simulations than LANs to approximate the likelihood accurately. To investigate whether these likelihood estimates were accurate enough to support accurate parameter inference, we evaluated the quality of the resulting posteriors, using a framework for benchmarking SBI algorithms (<xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>). We used the analytical likelihoods of the simple DDM to obtain reference posteriors for 100 different observations, via MCMC sampling. Each observation consisted of 100 independent and identically distributed (i.i.d.) trials simulated with parameters sampled from the prior (see <xref ref-type="fig" rid="fig3">Figure 3a</xref> for an example, details in Materials and methods). We then performed inference using MCMC based on the approximate likelihoods obtained with MNLE (10<sup>5</sup> budget, MNLE<sup>5</sup>) and the ones obtained with LAN for each of the three simulation budgets (LAN <inline-formula><mml:math id="inf25"><mml:msup><mml:mi/><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mn>11</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, respectively).</p><fig-group><fig id="fig3" position="float"><label>Figure 3.</label><caption><title>Mixed neural likelihood estimation (MNLE) infers accurate posteriors for the drift-diffusion model.</title><p>Posteriors were obtained given 100-trial independent and identically distributed (i.i.d.) observations with Markov Chain Monte Carlo (MCMC) using analytical (i.e., reference) likelihoods, or those approximated using LAN<sup>{5,8,11}</sup> trained with simulation budgets 10<sup>{5,8,11}</sup>, respectively, and MNLE<sup>5</sup> trained with a budget of 10<sup>5</sup> simulations. (<bold>a</bold>) Posteriors given an example observation generated from the prior and the simulator, shown as 95% contour lines in a corner plot, that is, one-dimensional marginal (diagonal) and all pairwise two-dimensional marginals (upper triangle). (<bold>b</bold>) Difference in posterior sample mean of approximate (LAN<sup>{5,8,11}</sup>, MNLE<sup>5</sup>) and reference posteriors (normalized by reference posterior standard deviation, lower is better). (<bold>c</bold>) Same as in (<bold>b</bold>) but for posterior sample variance (normalized by reference posterior variance, lower is better). (<bold>d</bold>) Parameter estimation error measured as mean-squared error (MSE) between posterior sample mean and the true underlying parameters (smallest possible error is given by reference posterior performance shown in blue). (<bold>e</bold>) Classification 2-sample test (C2ST) score between approximate (LAN<sup>{5,8,11}</sup>, MNLE<sup>5</sup>) and reference posterior samples (0.5 is best). All bar plots show metrics calculated from 100 repetitions with different observations; error bars show standard error of the mean.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig3.jpg"/></fig><fig id="fig3s1" position="float" specific-use="child-fig"><label>Figure 3—figure supplement 1.</label><caption><title>Drift-diffusion model (DDM) inference accuracy metrics for individual model parameters.</title><p>Inference accuracy given a 100-trial observation, measured as posterior mean accuracy (first row of panels), posterior variance (second row), and parameter estimation error (third row), shown in absolute terms, for each of the four DDM parameters separately (panels a, b, c, and d, respectively), and for each simulation budgets of LAN<sup>{5,8,11}</sup> (shades of orange) and for MNLE<sup>5</sup> trained with 10<sup>5</sup> simulations (green). Bars show the mean metric over 100 different observations, error bars show standard error of the mean.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig3-figsupp1.jpg"/></fig><fig id="fig3s2" position="float" specific-use="child-fig"><label>Figure 3—figure supplement 2.</label><caption><title>Drift-diffusion model (DDM) example posteriors and parameter recovery for likelihood approximation networks (LANs) trained with smaller simulation budgets.</title><p>(<bold>a</bold>) Posterior samples given 100-trial example observation, obtained with Markov Chain Monte Carlo (MCMC) using LAN approximate likelihoods trained based on 10<sup>5</sup> (LAN<sup>5</sup>) and 10<sup>8</sup> simulations (LAN<sup>8</sup>), and with the analytical likelihoods (reference). (<bold>b</bold>) Parameter recovery of LAN and the reference posterior shown as posterior sample means against the underlying ground-truth parameters.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig3-figsupp2.jpg"/></fig><fig id="fig3s3" position="float" specific-use="child-fig"><label>Figure 3—figure supplement 3.</label><caption><title>Drift-diffusion model (DDM) inference accuracy metrics for different numbers of observed trials.</title><p>Parameter estimation error (<bold>a</bold>), absolute posterior mean error (<bold>b</bold>), relative posterior variance error (<bold>c</bold>), and classification 2-sample test (C2ST) scores (<bold>d</bold>) shown for likelihood approximation network (LAN) with increasing simulation budgets (shades of orange, LAN<sup>{5,8,11}</sup>), mixed neural likelihood estimation (MNLE) trained with 10<sup>5</sup> simulations (green), and MNLE ensembles (purple). Metrics were calculated from 10,000 posterior samples and with respect to the reference posterior, averaged over 100 different observations. Error bars show standard error of the mean.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig3-figsupp3.jpg"/></fig></fig-group><p>Overall, we found that the likelihood approximation performances presented above were reflected in the inference performances: MNLE<sup>5</sup> performed substantially better than LAN<sup>5</sup> and LAN<sup>8</sup>, and equally well or better than LAN<sup>11</sup> (<xref ref-type="fig" rid="fig3">Figure 3b–d</xref>). In particular, MNLE<sup>5</sup> approximated the posterior mean more accurately than LAN<sup>{5,8,11}</sup> (<xref ref-type="fig" rid="fig3">Figure 3b</xref>), being more accurate than LAN<sup>{5,8,11}</sup> in 100, 90, and 67 out of 100 comparisons, respectively. In terms of posterior variance, MNLE<sup>5</sup> performed better than LAN<sup>{5,8}</sup> and on par with LAN<sup>11</sup> (<xref ref-type="fig" rid="fig3">Figure 3c</xref>), being more accurate than LAN<sup>{5,8,11}</sup> in 100, 93 (p <<0.01, binomial test), and 58 (<inline-formula><mml:math id="inf26"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.13</mml:mn></mml:mrow></mml:math></inline-formula>) out of 100 pairwise comparisons, respectively.</p><p>Additionally, we measured the parameter estimation accuracy as the MSE between the posterior mean and the ground-truth parameters underlying the observed data. We found that MNLE<sup>5</sup> estimation error was indistinguishable from that of the reference posterior, and that LAN performance was similar only for the substantially larger simulation budget of LAN<sup>11</sup> (<xref ref-type="fig" rid="fig3">Figure 3c</xref>), with MNLE being closer to reference performance than LAN<sup>{5,8,11}</sup> in 100, 91, and 66 out of 100 comparisons, respectively (all p < 0.01). Note that all three metrics were reported as averages over the four parameter dimensions of the DDM to keep the visualizations compact, and that this average did not affect the results qualitatively. We report metrics for each dimension in <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref>, as well as additional inference accuracy results for smaller LAN simulation budgets (<xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2</xref>) and for different numbers of observed trials (<xref ref-type="fig" rid="fig3s3">Figure 3—figure supplement 3</xref>).</p><p>Finally, we used the classifier 2-sample test (C2ST, <xref ref-type="bibr" rid="bib33">Lopez-Paz and Oquab, 2017</xref>; <xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>) to quantify the similarity between the estimated and reference posterior distributions. The C2ST is defined to be the error rate of a classification algorithm which aims to classify whether samples belong to the true or the estimated posterior. Thus, it ranges from 0.5 (no difference between the distributions, the classifier is at chance level) to 1.0 (the classifier can perfectly distinguish the two distributions). We note that the C2ST is a highly sensitive measure of discrepancy between two multivariate distributions—for example if the two distributions differ in <italic>any</italic> dimension, the C2ST will be close to 1 even if all other dimensions match perfectly. We found that neither of the two approaches was able to achieve perfect approximations, but that MNLE<sup>5</sup> achieved lower C2ST scores compared to LAN<sup>{5,8,11}</sup> on all simulation budgets (<xref ref-type="fig" rid="fig3">Figure 3e</xref>): mean C2ST score LAN<sup>{5,8,11}</sup>, 0.96, 0.78, 0.70 vs. MNLE<sup>5</sup>, 0.65, with MNLE<sup>5</sup> showing a better score than LAN<sup>{5,8,11}</sup> on 100, 91, and 68 out of 100 pairwise comparisons, respectively (all p < 0.01). In summary, MNLE achieves more accurate recovery of posterior means than LANs, similar or better recovery of posterior variances, and overall more accurate posteriors (as quantified by C2ST).</p></sec><sec id="s2-4"><title>MNLE posteriors have uncertainties which are well calibrated</title><p>For practical applications of inference, it is often desirable to know how well an inference procedure can recover the ground-truth parameters, and whether the uncertainty estimates are well calibrated, (<xref ref-type="bibr" rid="bib9">Cook et al., 2006</xref>), that is, that the <italic>uncertainty</italic> estimates of the posterior are balanced, and neither over-confident nor under-confident. For the DDM, we found that the posteriors inferred with MNLE and LANs (when using LAN<sup>11</sup>) recovered the ground-truth parameters accurately (in terms of posterior means, <xref ref-type="fig" rid="fig3">Figure 3d</xref> and <xref ref-type="fig" rid="fig4">Figure 4a</xref>)—in fact, parameter recovery was similarly accurate to using the ‘true’ analytical likelihoods, indicating that much of the residual error is due to stochasticity of the observations, and not the inaccuracy of the likelihood approximations.</p><fig-group><fig id="fig4" position="float"><label>Figure 4.</label><caption><title>Parameter recovery and posterior uncertainty calibration for the drift-diffusion model (DDM).</title><p>(<bold>a</bold>) Underlying ground-truth DDM parameters plotted against the sample mean of posterior samples inferred with the analytical likelihoods (reference, blue crosses), likelihood approximation network (LAN; orange circles), and mixed neural likelihood estimation (MNLE; green circles), for 100 different observations. Markers close to diagonal indicate good recovery of ground-truth parameters; circles on top of blue reference crosses indicate accurate posterior means. (<bold>b</bold>) Simulation-based calibration results showing empirical cumulative density functions (CDF) of the ground-truth parameters ranked under the inferred posteriors calculated from 100 different observations. A well-calibrated posterior must have uniformly distributed ranks, as indicated by the area shaded gray. Shown for reference posteriors (blue), LAN posteriors obtained with increasing simulation budgets (shades of orange, LAN<sup>{5,8,11}</sup>), and MNLE posterior (green, MNLE<sup>5</sup>), and for each parameter separately (<inline-formula><mml:math id="inf27"><mml:mi>v</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf28"><mml:mi>a</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf29"><mml:mi>w</mml:mi></mml:math></inline-formula>, and <italic>τ</italic>).</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig4.jpg"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><label>Figure 4—figure supplement 1.</label><caption><title>Drift-diffusion model (DDM) parameter recovery for different numbers of observed trials.</title><p>True underlying DDM parameters plotted against posterior sample means for 1, 10, and 1000 of observed independent and identically distributed (i.i.d.) trial(s) (panels a-c, respectively) and for the four DDM parameters <inline-formula><mml:math id="inf30"><mml:mi>v</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf31"><mml:mi>a</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf32"><mml:mi>w</mml:mi></mml:math></inline-formula>, and <italic>τ</italic> (in columns). Calculated from 10,000 posterior samples obtained with Markov Chain Monte Carlo (MCMC) using the reference (blue), LAN<sup>11</sup> (orange), and the MNLE<sup>5</sup> (green) likelihoods. Black line shows the identity function indicating perfect recovery.</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig4-figsupp1.jpg"/></fig><fig id="fig4s2" position="float" specific-use="child-fig"><label>Figure 4—figure supplement 2.</label><caption><title>Drift-diffusion model (DDM) simulation-based calibration (SBC) results for different numbers of observed trials.</title><p>SBC results in form empirical conditional density functions of the ranks of ground-truth parameters under the approximate posterior samples. We compared posterior samples based on analytical likelihoods (blue), LAN<sup>11</sup> (orange), and MNLE<sup>5</sup> (green), and an ensemble of five MLNEs (MNLE5*, purple); for each of the four parameters of the DDM and for 1, 10, and 1000 observed trials (panels a, b, and c, respectively). Gray areas show random deviations expected under uniformly distributed ranks (ideal case).</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig4-figsupp2.jpg"/></fig></fig-group><p>To assess posterior calibration, we used simulation-based calibration (SBC, <xref ref-type="bibr" rid="bib61">Talts et al., 2018</xref>). The basic idea of SBC is the following: If one repeats the inference with simulations from many different prior samples, then, with a well-calibrated inference method, the combined samples from all the inferred posteriors should be distributed according to the prior. One way to test this is to calculate the rank of each ground-truth parameter (samples from the prior) under its corresponding posterior, and to check whether all the ranks follow a uniform distribution. SBC results indicated that MNLE posteriors were as well calibrated as the reference posteriors, that is, the empirical cumulative density functions of the ranks were close to that of a uniform distribution (<xref ref-type="fig" rid="fig4">Figure 4b</xref>)—thus, on this example, MNLE inferences would likely be of similar quality compared to using the analytical likelihoods. When trained with the large simulation budget of 10<sup>11</sup> simulations, LANs, too appeared to recover most of the ground-truth parameters well. However, SBC detected a systematic underestimation of the parameter <inline-formula><mml:math id="inf33"><mml:mi>a</mml:mi></mml:math></inline-formula> and overestimation of the parameter <italic>τ</italic>, and this bias increased for the smaller simulation budgets of LAN<sup>5</sup> and LAN<sup>8</sup> (<xref ref-type="fig" rid="fig4">Figure 4b</xref>, see the deviation below and above the desired uniform distribution of ranks, respectively).</p><p>The results so far (i.e., <xref ref-type="fig" rid="fig3">Figures 3</xref> and <xref ref-type="fig" rid="fig4">4</xref>) indicate that both LAN<sup>11</sup> and MNLE<sup>5</sup> lead to similar parameter recovery, but only MNLE<sup>5</sup> leads to posteriors which were well calibrated for all parameters. These results were obtained using a scenario with 100 i.i.d. trials. When increasing the number of trials (e.g., to 1000 trials), posteriors become very concentrated around the ground-truth value. In that case, while the posteriors overall identified the ground-truth parameter value very well (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1c</xref>), even small deviations in the posteriors can have large effects on the posterior metrics (<xref ref-type="fig" rid="fig3s3">Figure 3—figure supplement 3</xref>). This effect was also detected by SBC, showing systematic biases for some parameters (<xref ref-type="fig" rid="fig4s2">Figure 4—figure supplement 2</xref>). For MNLE, we found that these biases were smaller, and furthermore that it was possible to mitigate this effect by inferring the posterior using ensembles, for example, by combining samples inferred with five MNLEs trained with identical settings but different random initialization (see Appendix 1 for details). These results show the utility of using SBC as a tool to test posterior coverage, especially when studying models for which reference posteriors are not available, as we demonstrate in the next section.</p></sec><sec id="s2-5"><title>MNLE infers well-calibrated, predictive posteriors for a DDM with collapsing bounds</title><p>MNLE was designed to be applicable to models for which evaluation of the likelihood is not practical so that standard inference tools cannot be used. To demonstrate this, we applied MNLE to a variant of the DDM for which analytical likelihoods are not available (note, however, that numerical approximation of likelihoods for this model would be possible, see e.g., <xref ref-type="bibr" rid="bib59">Shinn et al., 2020</xref>, Materials and methods for details). This DDM variant simulates a decision variable like the simple DDM used above, but with linearly collapsing instead of constant decision boundaries (see e.g., <xref ref-type="bibr" rid="bib22">Hawkins et al., 2015</xref>; <xref ref-type="bibr" rid="bib42">Palestro et al., 2018</xref>). The collapsing bounds are incorporated with an additional parameter <italic>γ</italic> indicating the slope of the decision boundary, such that <inline-formula><mml:math id="inf34"><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>τ</mml:mi><mml:mo>,</mml:mo><mml:mi>γ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> (see Details of the numerical comparison).</p><p>We tested inference with MNLE on the DDM with linearly collapsing bound using observations comprised of 100 i.i.d. trials simulated with parameters sampled from the prior. Using the same MNLE training and MCMC settings as above, we found that posterior density concentrated around the underlying ground-truth parameters (see <xref ref-type="fig" rid="fig5">Figure 5a</xref>), suggesting that MNLE learned the underlying likelihood accurately. To assess inference quality systematically without needing reference posteriors, we performed posterior predictive checks by running simulations with the inferred posteriors samples and comparing them to observed (simulated) data, and checked posterior calibration properties using SBC. We found that the inferred posteriors have good predictive performance, that is, data simulated from the inferred posterior samples accurately matched the observed data (<xref ref-type="fig" rid="fig5">Figure 5b</xref>), and that their uncertainties are well calibrated as quantified by the SBC results (<xref ref-type="fig" rid="fig5">Figure 5c</xref>). Overall, this indicated that MNLE accurately inferred the posterior of this intractable variant of the DDM.</p><fig id="fig5" position="float"><label>Figure 5.</label><caption><title>Mixed neural likelihood estimation (MNLE) infers accurate posteriors for the drift-diffusion model (DDM) with collapsing bounds.</title><p>Posterior samples were obtained given 100-trial observations simulated from the DDM with linearly collapsing bounds, using MNLE and Markov Chain Monte Carlo (MCMC). (<bold>a</bold>) Approximate posteriors shown as 95% contour lines in a corner plot of one- (diagonal) and two-dimensional (upper triangle) marginals, for a representative 100-trial observation simulated from the DDM. (<bold>b</bold>) Reaction times and choices simulated from the ground-truth parameters (blue) compared to those simulated given parameters sampled from the prior (prior predictive distribution, purple) and from the MNLE posterior shown in (<bold>a</bold>) (posterior predictive distribution, green). (<bold>c</bold>) Simulation-based calibration results showing empirical cumulative density functions (CDF) of the ground-truth parameters ranked under the inferred posteriors, calculated from 100 different observations. A well-calibrated posterior must have uniformly distributed ranks, as indicated by the area shaded gray. Shown for each parameter separately (<inline-formula><mml:math id="inf35"><mml:mi>v</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf36"><mml:mi>a</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf37"><mml:mi>w</mml:mi></mml:math></inline-formula>, <italic>τ</italic>, and <italic>γ</italic>).</p></caption><graphic mimetype="image" mime-subtype="jpeg" xlink:href="elife-77220.xml.media/fig5.jpg"/></fig></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>Statistical inference for computational models in cognitive neuroscience can be challenging because models often do not have tractable likelihood functions. The recently proposed LAN method (<xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>) performs SBI for a subset of such models (DDMs) by training neural networks with model simulations to approximate the intractable likelihood. However, LANs require large amounts of training data, restricting its usage to canonical models. We proposed an alternative approached called MNLE, a synthetic neural likelihood method which is tailored to the data types encountered in many models of decision-making.</p><p>Our comparison on a tractable example problem used in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref> showed that MNLE performed on par with LANs using six orders of magnitude fewer model simulations for training. While <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref> discuss that LANs were not optimized for simulation efficiency and that it might be possible to reduce the required model simulations, we emphasize that the difference in simulation efficiency is due to an inherent property of LANs. For each parameter in the training data, LANs require empirical likelihood targets that have to be estimated by building histograms or kernel density estimates from thousands of simulations. MNLE, instead, performs conditional density estimation without the need of likelihood targets and can work with only one simulation per parameter. Because of these conceptual differences, we expect the substantial performance advantage of MNLE to be robust to the specifics of the implementation.</p><p>After the networks are trained, the time needed for each evaluation determines the speed of inference. In that respect, both LANs and MNLEs are conceptually similar in that they require a single forward-pass through a neural network for each evaluation, and we found MNLE and the original implementation of LANs to require comparable computation times. However, evaluation time will depend, for example, on the exact network architecture, software framework, and computing infrastructure used. Code for a new PyTorch implementation of LANs has recently been released and improved upon the evaluation speed original implementation we compared to. While this new implementation made LAN significantly faster for a single forward-pass, we observed that the resulting time difference with the MCMC settings used here was only on the order of minutes, whereas the difference in simulation time for LAN<sup>11</sup> vs. MNLE<sup>5</sup> was on the order of days. The exact timings will always be implementation specific and whether or not these differences are important will depend on the application at hand. In a situation where iteration over model design is required (i.e., custom DDMs), an increase in training efficiency on the order of days would be advantageous.</p><p>There exist a number of approaches with corresponding software packages for estimating parameters of cognitive neuroscience models, and DDMs in particular. However, these approaches either only estimate single best-fitting parameters (<xref ref-type="bibr" rid="bib71">Voss and Voss, 2007</xref>; <xref ref-type="bibr" rid="bib72">Wagenmakers et al., 2007</xref>; <xref ref-type="bibr" rid="bib7">Chandrasekaran and Hawkins, 2019</xref>; <xref ref-type="bibr" rid="bib23">Heathcote et al., 2019</xref>; <xref ref-type="bibr" rid="bib59">Shinn et al., 2020</xref>) or, if they perform fully Bayesian inference, can only produce Gaussian approximations to posteriors (<xref ref-type="bibr" rid="bib15">Feltgen and Daunizeau, 2021</xref>), or are restricted to variants of the DDM for which the likelihood can be evaluated (<xref ref-type="bibr" rid="bib73">Wiecki et al., 2013</xref>, HDDM [Hierarchical DDM] toolbox). A recent extension of the HDDM toolbox includes LANs, thereby combining HDDM’s flexibility with LAN’s ability to perform inference without access to the likelihood function (but this remains restricted to variants of the DDM for which LAN can be pretrained). In contrast, MNLE can be applied to any simulation-based model with intractable likelihoods and mixed data type outputs. Here, we focused on the direct comparison to LANs based on variants of the DDM. We note that these models have a rather low-dimensional observation structure (as common in many cognitive neuroscience models), and that our examples did not include additional parameter structure, for example, stimulus encoding parameters, which would increase the dimensionality of the learning problem. However, other variants of neural density estimation have been applied successfully to a variety of problems with higher dimensionality (see e.g., <xref ref-type="bibr" rid="bib18">Gonçalves et al., 2020</xref>; <xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>; <xref ref-type="bibr" rid="bib17">Glöckler et al., 2021</xref>; <xref ref-type="bibr" rid="bib11">Dax et al., 2022</xref>). Therefore, we expect MNLE to be applicable to other simulation-based problems with higher-dimensional observation structure and parameter spaces, and to scale more favorably than LANs. Like for any neural network-based approach, applying MNLE to different inference problems may require selecting different architecture and training hyperparameters settings, which may induce additional computational training costs. To help with this process, we adopted default settings which have been shown to work well on a large range of SBI benchmarking problems (<xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>), and we integrated MNLE into the established sbi python package that provides well-documented implementations for training- and inference performance of SBI algorithms.</p><p>Several extensions to classical SL approaches have addressed the problem of a bias in the likelihood approximation due to the strong parametric assumptions, that is, Gaussianity, the use of summary statistics, or finite-sample biases (<xref ref-type="bibr" rid="bib50">Price et al., 2018</xref>; <xref ref-type="bibr" rid="bib40">Ong et al., 2009</xref>; <xref ref-type="bibr" rid="bib68">van Opheusden et al., 2020</xref>). MNLE builds on flexible neural likelihood estimators, for example, normalizing flows, and does not require summary statistics for a low-dimensional simulator like the DDM, so would be less susceptible to these first two biases. It could be subject to biases resulting from the estimation of the log-likelihoods from a finite number of simulations. In our numerical experiments, and for the simulation budgets we used, we did not observe biased inference results. We speculate that the ability of neural density estimators to pool data across multiple parameter settings (rather than using only data from a specific parameter set, like in classical SL methods) mitigates finite-sample effects.</p><p>MNLE is an SBI method which uses neural density estimators to estimate likelihoods. Alternatives to neural likelihood estimation include neural posterior estimation (NPE, <xref ref-type="bibr" rid="bib44">Papamakarios and Murray, 2016</xref>; <xref ref-type="bibr" rid="bib34">Lueckmann et al., 2017</xref>; <xref ref-type="bibr" rid="bib19">Greenberg et al., 2019</xref>, which uses conditional density estimation to learn the posterior directly) and neural ratio estimation (NRE, <xref ref-type="bibr" rid="bib24">Hermans et al., 2020</xref>; <xref ref-type="bibr" rid="bib14">Durkan et al., 2020</xref>, which uses classification to approximate the likelihood-to-evidence ratio to then perform MCMC). These approaches could in principle be applied here as well, however, they are not as well suited for the flexible inference scenarios common in decision-making models as MNLE. NPE directly targets the posterior and therefore, by design, typically requires retraining if the structure of the problem changes (e.g., if the prior or the hierarchical structure of the model changes). There are variants of NPE that use embedding nets which can amortize over changing number of trials, avoiding retraining (<xref ref-type="bibr" rid="bib53">Radev et al., 2022</xref>, <xref ref-type="bibr" rid="bib70">von Krause et al., 2022</xref>). NRE learns the likelihood-to-evidence ratio using ratio estimation (and not density estimation) and would not provide an emulator of the simulator.</p><p>Regarding future research directions, MNLE has the potential to become more simulation efficient by using weight sharing between the discrete and the continuous neural density estimators (rather than to use separate neural networks, as we did here). Moreover, for high-dimensional inference problems in which slice sampling-based MCMC might struggle, MNLE could be used in conjunction with gradient-based MCMC methods like Hamiltonian Monte Carlo (HMC, <xref ref-type="bibr" rid="bib6">Brooks et al., 2011</xref>; <xref ref-type="bibr" rid="bib25">Hoffman and Gelman, 2014</xref>), or variational inference as recently proposed by <xref ref-type="bibr" rid="bib74">Wiqvist et al., 2021</xref> and <xref ref-type="bibr" rid="bib17">Glöckler et al., 2021</xref>. With MNLE’s full integration into the sbi package, both gradient-based MCMC methods from Pyro (<xref ref-type="bibr" rid="bib3">Bingham et al., 2019</xref>), and variational inference for SBI (SNVI, <xref ref-type="bibr" rid="bib17">Glöckler et al., 2021</xref>) are available as inference methods for MNLE (a comparison of HMC and SNVI to slice sampling MCMC on one example observation resulted in indistinguishable posterior samples). Finally, using its emulator property (see Appendix 1), MNLE could be applied in an active learning setting for highly expensive simulators in which new simulations are chosen adaptively according to a acquisition function in a Bayesian optimization framework (<xref ref-type="bibr" rid="bib20">Gutmann and Corander, 2016</xref>; <xref ref-type="bibr" rid="bib35">Lueckmann et al., 2019</xref>; <xref ref-type="bibr" rid="bib26">Järvenpää et al., 2019</xref>).</p><p>In summary, MNLE enables flexible and efficient inference of parameters of models in cognitive neuroscience with intractable likelihoods. The training efficiency and flexibility of the neural density estimators used overcome the limitations of LANs (<xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>). Critically, these features enable researchers to develop customized models of decision-making and not just apply existing models to new data. We implemented our approach as an extension to a public <ext-link ext-link-type="uri" xlink:href="https://github.com/mackelab/sbi">sbi</ext-link> python package with detailed documentation and examples to make it accessible for practitioners.</p></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><sec id="s4-1"><title>Mixed neural likelihood estimation</title><p>MNLE extends the framework of neural likelihood estimation (<xref ref-type="bibr" rid="bib45">Papamakarios et al., 2019a</xref>; <xref ref-type="bibr" rid="bib35">Lueckmann et al., 2019</xref>) to be applicable to simulation-based models with mixed data types. It learns a parametric model <inline-formula><mml:math id="inf38"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>ψ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> of the intractable likelihood <inline-formula><mml:math id="inf39"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> defined implicitly by the simulation-based model. The parameters <italic>ψ</italic> are learned with training data <inline-formula><mml:math id="inf40"><mml:msub><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">x</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> comprised of model parameters <inline-formula><mml:math id="inf41"><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math></inline-formula> and their corresponding data simulated from the model <inline-formula><mml:math id="inf42"><mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold">x</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>∼</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. The parameters are sampled from a proposal distribution over parameters <inline-formula><mml:math id="inf43"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>∼</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. The proposal distribution could be any distribution, but it determines the parameter regions for which the density estimator will be good in estimating likelihoods. Thus, the prior, or a distribution that contains the support of the prior (or even all priors which one expects to use in the future) constitutes a useful choice. After training, the emulator can be used to generate synthetic data <inline-formula><mml:math id="inf44"><mml:mrow><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo>∼</mml:mo><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>ψ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> given parameters, and to evaluate the SL <inline-formula><mml:math id="inf45"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>ψ</mml:mi></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> given experimentally observed data. Finally, the SL can be used to obtain posterior samples via<disp-formula id="equ1"><label>(1)</label><mml:math id="m1"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>∝</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>ψ</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>through approximate inference with MCMC. Importantly, the training is amortized, that is, the emulator can be applied to new experimental data without retraining (running MCMC is still required).</p><p>We tailored MNLE to simulation-based models that return mixed data, for example, in form of reaction times <inline-formula><mml:math id="inf46"><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> and (usually categorical) choices <inline-formula><mml:math id="inf47"><mml:mi>c</mml:mi></mml:math></inline-formula> as for the DDM. Conditional density estimation with normalizing flows has been proposed for continuous random variables (<xref ref-type="bibr" rid="bib45">Papamakarios et al., 2019a</xref>), or discrete random variables (<xref ref-type="bibr" rid="bib63">Tran et al., 2019</xref>), but not for mixed data. Our solution for estimating the likelihood of mixed data is to simply factorize the likelihood into continuous and discrete variables,<disp-formula id="equ2"><label>(2)</label><mml:math id="m2"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mspace width="thickmathspace"/><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>and use two separate neural likelihood estimators: A discrete one <inline-formula><mml:math id="inf48"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:msub></mml:math></inline-formula> to estimate <inline-formula><mml:math id="inf49"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and a continuous one <inline-formula><mml:math id="inf50"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:msub></mml:math></inline-formula> to estimate <inline-formula><mml:math id="inf51"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. We defined <inline-formula><mml:math id="inf52"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:msub></mml:math></inline-formula> to be a Bernoulli model and use a neural network to learn the Bernoulli probability <italic>ρ</italic> given parameters <inline-formula><mml:math id="inf53"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula>. For <inline-formula><mml:math id="inf54"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:msub></mml:math></inline-formula> we used a conditional neural spline flow (NSF, <xref ref-type="bibr" rid="bib13">Durkan et al., 2019</xref>) to learn the density of <inline-formula><mml:math id="inf55"><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> given a parameter <inline-formula><mml:math id="inf56"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula><italic>and</italic> choice <inline-formula><mml:math id="inf57"><mml:mi>c</mml:mi></mml:math></inline-formula>. The two estimators are trained separately using the same training data (see Neural network architecture, training and hyperparameters for details). After training, the full neural likelihood can be constructed by multiplying the likelihood estimates <inline-formula><mml:math id="inf58"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf59"><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:msub></mml:math></inline-formula> back together:<disp-formula id="equ3"><label>(3)</label><mml:math id="m3"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mspace width="thickmathspace"/><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>Note that, as the second estimator <inline-formula><mml:math id="inf60"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:msub><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> is conditioned on the choice <inline-formula><mml:math id="inf61"><mml:mi>c</mml:mi></mml:math></inline-formula>, our likelihood model can account for statistical dependencies between choices and reaction times. The neural likelihood can then be used to approximate the intractable likelihood defined by the simulator, for example, for inference with MCMC. Additionally, it can be used to sample synthetic data given model parameters, without running the simulator (see The emulator property of MNLE).</p></sec><sec id="s4-2"><title>Relation to LAN</title><p>Neural likelihood estimation can be much more simulation efficient than previous approaches because it exploits continuity across the parameters by making the density estimation conditional. <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>, too, aim to exploit continuity across the parameter space by training a neural network to predict the value of the likelihood function from parameters <inline-formula><mml:math id="inf62"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula> and data <inline-formula><mml:math id="inf63"><mml:mi mathvariant="bold">x</mml:mi></mml:math></inline-formula>. However, the difference to neural likelihood estimation is that they do not use the neural network for density estimation directly, but instead do classical neural network-based regression on likelihood targets. Crucially, the likelihood targets first have to obtained for each parameter in the training dataset. To do so, one has to perform density estimation using KDE (as proposed by <xref ref-type="bibr" rid="bib65">Turner et al., 2015</xref>) or empirical histograms for <italic>every</italic> parameter separately. Once trained, LANs do indeed exploit the continuity across the parameter space (they can predict log-likelihoods given unseen data and parameters), however, they do so at a very high simulation cost: For a training dataset of <inline-formula><mml:math id="inf64"><mml:mi>N</mml:mi></mml:math></inline-formula> parameters, they perform <inline-formula><mml:math id="inf65"><mml:mi>N</mml:mi></mml:math></inline-formula> times KDE based on <inline-formula><mml:math id="inf66"><mml:mi>M</mml:mi></mml:math></inline-formula> simulations each<sup>11</sup>1 For models with categorical output, that is, all decision-making models, KDE is performed separately for each choice., resulting is an overall simulation budget of <inline-formula><mml:math id="inf67"><mml:mrow><mml:mi>N</mml:mi><mml:mo>⋅</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="inf68"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mi/></mml:mrow></mml:math></inline-formula> 1.5 million and <inline-formula><mml:math id="inf69"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi/></mml:mrow></mml:math></inline-formula> 100,000 for ‘pointwise’ LAN approach).</p></sec><sec id="s4-3"><title>Details of the numerical comparison</title><p>The comparison between MNLE and LAN is based on the DDM. The DDM simulates a decision variable <inline-formula><mml:math id="inf70"><mml:mi>X</mml:mi></mml:math></inline-formula> as a stochastic differential equation with parameters <inline-formula><mml:math id="inf71"><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>τ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>:<disp-formula id="equ4"><label>(4)</label><mml:math id="m4"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mtext>d</mml:mtext><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>τ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>v</mml:mi><mml:mtext>dt</mml:mtext><mml:mo>+</mml:mo><mml:mtext>d</mml:mtext><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"/><mml:mspace width="thickmathspace"/><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>τ</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>where <inline-formula><mml:math id="inf72"><mml:mi>W</mml:mi></mml:math></inline-formula> a Wiener noise process. The priors over the parameters are defined to be uniform: <inline-formula><mml:math id="inf73"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>v</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">U</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo>−</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> is the drift, <inline-formula><mml:math id="inf74"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>a</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">U</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> the boundary separation, <inline-formula><mml:math id="inf75"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>w</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">U</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0.3</mml:mn><mml:mo>,</mml:mo><mml:mn>0.7</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> the initial offset, and <inline-formula><mml:math id="inf76"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>τ</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">U</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0.2</mml:mn><mml:mo>,</mml:mo><mml:mn>1.8</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> the nondecision time. A single simulation from the model returns a choice <inline-formula><mml:math id="inf77"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>c</mml:mi><mml:mo>∈</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo fence="false" stretchy="false">}</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> and the corresponding reaction time in seconds <inline-formula><mml:math id="inf78"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>∈</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>τ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>.</p><p>For this version of the DDM the likelihood of an observation <inline-formula><mml:math id="inf79"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> given parameters <inline-formula><mml:math id="inf80"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="inf81"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, can be calculated analytically (<xref ref-type="bibr" rid="bib38">Navarro and Fuss, 2009</xref>). To simulate the DDM and calculate analytical likelihoods we used the approach and the implementation proposed by <xref ref-type="bibr" rid="bib12">Drugowitsch, 2016</xref>. We numerically confirmed that the simulations and the analytical likelihoods match those obtained from the research code provided by <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>.</p><p>To run LANs, we downloaded the neural network weights of the pretrained models from the repository mentioned in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>. The budget of training simulations used for the LANs was <inline-formula><mml:math id="inf82"><mml:mrow><mml:mn>1.5</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mn>11</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (1.5 million training data points, each obtained from KDE based on 10<sup>5</sup> simulations). We only considered the approach based on training a multilayer perceptron on single-trial likelihoods (‘pointwise approach’, <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>). At a later stage of our study we performed additional experiments to evaluate the performance of LANs trained at smaller simulation budgets, for example, for 10<sup>5</sup> and 10<sup>8</sup> simulations. For this analysis, we used an updated implementation of LANs based on PyTorch that was provided by the authors. We used the training routines and default settings provided with that implementation. To train LANs at the smaller budgets we used the following splits of budgets into number of parameter settings drawn from the prior, and number of simulations per parameter setting used for fitting the KDE: for the 10<sup>5</sup> budget we used 10<sup>2</sup> and 10<sup>3</sup>, respectively (we ran experiments splitting the other way around, but results were slightly better for this split); for the 10<sup>8</sup> budget we used an equal split of 10<sup>4</sup> each (all code publicly available, see Code availability).</p><p>To run MNLE, we extended the implementation of neural likelihood estimation in the sbi toolbox (<xref ref-type="bibr" rid="bib62">Tejero-Cantero et al., 2020</xref>). All comparisons were performed on a single AMD Ryzen Threadripper 1920X 12-Core processor with 2.2 GHz and the code is publicly available (see Code availability).</p><p>For the DDM variant with linearly collapsing decision boundaries, the boundaries were parametrized by the initial boundary separation, <inline-formula><mml:math id="inf83"><mml:mi>a</mml:mi></mml:math></inline-formula>, and one additional parameter, <italic>γ</italic>, indicating the slope with which the boundary approaches zero. This resulted in a five-dimensional parameter space for which we used the same prior as above, plus the an additional uniform prior for the slope <inline-formula><mml:math id="inf84"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>γ</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi mathvariant="script">U</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo>−</mml:mo><mml:mn>1.0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>. To simulate this DDM variant, we again used the Julia package by <xref ref-type="bibr" rid="bib12">Drugowitsch, 2016</xref>, but we note that for this variant no analytical likelihoods are available. While it would be possible to approximate the likelihoods numerically using the Fokker–Planck equations (see e.g., <xref ref-type="bibr" rid="bib59">Shinn et al., 2020</xref>), this would usually involve a trade-off between computation time and accuracy as well as design of bespoke solutions for individual models, and was not pursued here.</p></sec><sec id="s4-4"><title>Flexible Bayesian inference with MCMC</title><p>Once the MNLE is trained, it can be used for MCMC to obtain posterior samples <inline-formula><mml:math id="inf85"><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>∼</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo lspace="2.5pt" rspace="2.5pt" stretchy="false">|</mml:mo><mml:mi mathvariant="bold">x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> given experimentally observed data <inline-formula><mml:math id="inf86"><mml:mi mathvariant="bold">x</mml:mi></mml:math></inline-formula>. To sample from posteriors via MCMC, we transformed the parameters to unconstrained space, used slice sampling (<xref ref-type="bibr" rid="bib39">Neal, 2009</xref>), and initialized ten parallel chains using sequential importance sampling (<xref ref-type="bibr" rid="bib45">Papamakarios et al., 2019a</xref>), all as implemented in the sbi toolbox. We ran MCMC with identical settings for MNLE and LAN.</p><p>Importantly, performing MNLE and then using MCMC to obtain posterior samples allows for flexible inference scenarios because the form of <inline-formula><mml:math id="inf87"><mml:mi mathvariant="bold">x</mml:mi></mml:math></inline-formula> is not fixed. For example, when the model produces trial-based data that satisfy the i.i.d. assumption, for example, a set of reaction times and choices <inline-formula><mml:math id="inf88"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> in a DDM, then MNLE allows to perform inference given varying numbers of trials, without retraining. This is achieved by training MNLE based on single-trial likelihoods once and then combining multiple trials into the joint likelihood only when running MCMC:<disp-formula id="equ5"><label>(5)</label><mml:math id="m5"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold">X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>∝</mml:mo><mml:munderover><mml:mo>∏</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mi>q</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mspace width="thickmathspace"/><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>Similarly, one can use the neural likelihood to perform hierarchical inference with MCMC, all without the need for retraining (see <xref ref-type="bibr" rid="bib24">Hermans et al., 2020</xref>; <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>, for examples).</p><sec id="s4-4-1"><title>Stimulus- and intertrial dependencies</title><p>Simulation-based models in cognitive neuroscience often depend not only on a set of parameters <inline-formula><mml:math id="inf89"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula>, but additionally on (a set of) stimulus variables <inline-formula><mml:math id="inf90"><mml:mi>s</mml:mi></mml:math></inline-formula> which are typically given as part of the experimental conditions. MNLE can be readily adapted to this scenario by generating simulated data with multiple stimulus variables, and including them as additional inputs to the network during inference. Similarly, MNLE could be adapted to scenarios in which the i.i.d. assumption across trials as used above (see Flexible Bayesian inference with MCMC) does not hold. Again, this would be achieved by adapting the model simulator accordingly. For example, when inferring parameters <inline-formula><mml:math id="inf91"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula> of a DDM for which the outcome of the current trial <inline-formula><mml:math id="inf92"><mml:mi>i</mml:mi></mml:math></inline-formula> additionally depends on current stimulus variables <italic>s</italic><sub><italic>i</italic></sub> as well as on previous stimuli <italic>s</italic><sub><italic>j</italic></sub> and responses <italic>r</italic><sub><italic>j</italic></sub>, then one would implement the DDM simulator as a function <inline-formula><mml:math id="inf93"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo>;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>−</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>…</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>−</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>…</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula> (where <inline-formula><mml:math id="inf94"><mml:mi>T</mml:mi></mml:math></inline-formula> is a history parameter) and then learn the underlying likelihood by emulating <inline-formula><mml:math id="inf95"><mml:mi>f</mml:mi></mml:math></inline-formula> with MNLE.</p></sec></sec><sec id="s4-5"><title>Neural network architecture, training, and hyperparameters</title><sec id="s4-5-1"><title>Architecture</title><p>For the architecture of the Bernoulli model we chose a feed-forward neural network that takes parameters <inline-formula><mml:math id="inf96"><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:math></inline-formula> as input and predicts the Bernoulli probability <italic>ρ</italic> of the corresponding choices. For the normalizing flow we used the NSF architecture (<xref ref-type="bibr" rid="bib13">Durkan et al., 2019</xref>). NSFs use a standard normal base distribution and transform it using several modules of monotonic rational-quadratic splines whose parameters are learned by invertible neural networks. Using an unbounded base distribution for modeling data with bounded support, for example, reaction time data <inline-formula><mml:math id="inf97"><mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, can be challenging. To account for this, we tested two approaches: We either transformed the reaction time data to logarithmic space to obtain an unbounded support (<inline-formula><mml:math id="inf98"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>∈</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mo>−</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>), or we used a log-normal base distribution with rectified (instead of linear) tails for the splines (see <xref ref-type="bibr" rid="bib13">Durkan et al., 2019</xref>, for details and Architecture and training hyperparameters for the architecture settings used).</p></sec><sec id="s4-5-2"><title>Training</title><p>The neural network parameters <inline-formula><mml:math id="inf99"><mml:msub><mml:mi>ψ</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="inf100"><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> were trained using the maximum likelihood loss and the Adam optimizer (<xref ref-type="bibr" rid="bib28">Kingma and Ba, 2015</xref>). As proposal distribution for the training dataset we used the prior over DDM parameters. Given a training dataset of parameters, choices, and reaction times <inline-formula><mml:math id="inf101"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:msubsup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mstyle></mml:math></inline-formula> with <inline-formula><mml:math id="inf102"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>∼</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>;</mml:mo><mml:mspace width="thickmathspace"/><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>∼</mml:mo><mml:mtext>DDM</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></inline-formula>, we minimized the negative log-probability of the model. In particular, using the same training data, we trained the Bernoulli choice model by minimizing<disp-formula id="equ6"><label>(6)</label><mml:math id="m6"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mo>−</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>and the NSF by minimizing<disp-formula id="equ7"><label>(7)</label><mml:math id="m7"><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mo>−</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mi>log</mml:mi><mml:mo>⁡</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msub><mml:mi>ψ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="bold-italic">θ</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:mstyle></mml:mrow></mml:math></disp-formula></p><p>Training was performed with code and training hyperparameter settings provided in the sbi toolbox (<xref ref-type="bibr" rid="bib62">Tejero-Cantero et al., 2020</xref>).</p></sec><sec id="s4-5-3"><title>Hyperparameters</title><p>MNLE requires a number of hyperparameter choices regarding the neural network architectures, for example, number of hidden layers, number of hidden units, number of stacked NSF transforms, kind of base distribution, among others (<xref ref-type="bibr" rid="bib13">Durkan et al., 2019</xref>). With our implementation building on the sbi package we based our hyperparameter choices on the default settings provided there. This resulted in likelihood accuracy similar to LAN, but longer evaluation times due to the complexity of the underlying normalizing flow architecture.</p><p>To reduce evaluation time of MNLE, we further adapted the architecture to the example model (DDM). In particular, we ran a cross-validation of the hyperparameters relevant for evaluation time, that is, number of hidden layers, hidden units, NSF transforms, spline bins, and selected those that were optimal in terms of Huber loss and MSE between the approximate and the <italic>analytical</italic> likelihoods, as well as evaluation time. This resulted in an architecture with performance <italic>and</italic> evaluation time similar to LANs (more details in Appendix: Architecture and training hyperparameters). The cross-validation relied on access to the analytical likelihoods which is usually not given in practice, for example, for simulators with intractable likelihoods. However, we note that in cases without access to analytical likelihoods a similar cross-validation can be performed using quality measures other than the difference to the analytical likelihood, for example, by comparing the observed data with synthetic data and SLs provided by MNLE.</p></sec></sec></sec></body><back><sec sec-type="additional-information" id="s5"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="COI-statement" id="conf1"><p>No competing interests declared</p></fn><fn fn-type="COI-statement" id="conf2"><p>No competing interests declared</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con2"><p>Conceptualization, Supervision, Visualization, Methodology, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con3"><p>Conceptualization, Software, Validation, Visualization, Writing – original draft, Writing – review and editing</p></fn><fn fn-type="con" id="con4"><p>Conceptualization, Supervision, Funding acquisition, Methodology, Project administration, Writing – review and editing</p></fn></fn-group></sec><sec sec-type="supplementary-material" id="s6"><title>Additional files</title><supplementary-material id="transrepform"><label>Transparent reporting form</label><media xlink:href="elife-77220-transrepform1-v2.pdf" mimetype="application" mime-subtype="pdf"/></supplementary-material></sec><sec sec-type="data-availability" id="s7"><title>Data availability</title><p>We implemented MNLE as part of the open source package for SBI, sbi, available at <ext-link ext-link-type="uri" xlink:href="https://github.com/mackelab/sbi">https://github.com/mackelab/sbi</ext-link>, copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:d327a49bdf0127726927c70c6b216405196653d6;origin=https://github.com/mackelab/sbi;visit=swh:1:snp:ff58cc8344d88bbe9952872597690748adb5798b;anchor=swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560">swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560</ext-link>. Code for reproducing the results presented here, and tutorials on how to apply MNLE to other simulators using sbi can be found at <ext-link ext-link-type="uri" xlink:href="https://github.com/mackelab/mnle-for-ddms">https://github.com/mackelab/mnle-for-ddms</ext-link>, copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:e20ac3b3d5324d29f262cc52388b3916526d2518;origin=https://github.com/mackelab/mnle-for-ddms;visit=swh:1:snp:b9707619efaa06519bb97d54db256cc99f78df3f;anchor=swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321">swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321</ext-link>.</p></sec><ack id="ack"><title>Acknowledgements</title><p>We thank Luigi Acerbi, Michael Deistler, Alexander Fengler, Michael Frank, and Ingeborg Wenger for discussions and comments on a preliminary version of the manuscript. We also acknowledge and thank the Python and Julia communities for developing the tools enabling this work, including DifferentialEquations.jl, DiffModels.jl, NumPy, pandas, Pyro, PyTorch, sbi, sbibm, and Scikit-learn (see Appendix for details).</p></ack><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>An</surname><given-names>Z</given-names></name><name><surname>South</surname><given-names>LF</given-names></name><name><surname>Nott</surname><given-names>DJ</given-names></name><name><surname>Drovandi</surname><given-names>CC</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Accelerating bayesian synthetic likelihood with the graphical lasso</article-title><source>Journal of Computational and Graphical Statistics</source><volume>28</volume><fpage>471</fpage><lpage>475</lpage><pub-id pub-id-type="doi">10.1080/10618600.2018.1537928</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bezanson</surname><given-names>J</given-names></name><name><surname>Edelman</surname><given-names>A</given-names></name><name><surname>Karpinski</surname><given-names>S</given-names></name><name><surname>Shah</surname><given-names>VB</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Julia: A fresh approach to numerical computing</article-title><source>SIAM Review</source><volume>59</volume><fpage>65</fpage><lpage>98</lpage><pub-id pub-id-type="doi">10.1137/141000671</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bingham</surname><given-names>E</given-names></name><name><surname>Chen</surname><given-names>JP</given-names></name><name><surname>Jankowiak</surname><given-names>M</given-names></name><name><surname>Obermeyer</surname><given-names>F</given-names></name><name><surname>Pradhan</surname><given-names>N</given-names></name><name><surname>Karaletsos</surname><given-names>T</given-names></name><name><surname>Singh</surname><given-names>R</given-names></name><name><surname>Szerlip</surname><given-names>P</given-names></name><name><surname>Horsfall</surname><given-names>P</given-names></name><name><surname>Goodman</surname><given-names>ND</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Pyro: Deep universal probabilistic programming</article-title><source>Journal of Machine Learning Research</source><volume>20</volume><fpage>973</fpage><lpage>978</lpage></element-citation></ref><ref id="bib4"><element-citation publication-type="software"><person-group person-group-type="author"><name><surname>Boelts</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2022">2022a</year><data-title>sbi: simulation-based inference</data-title><version designator="swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560">swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560</version><source>Software Heritage</source><ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:d327a49bdf0127726927c70c6b216405196653d6;origin=https://github.com/mackelab/sbi;visit=swh:1:snp:ff58cc8344d88bbe9952872597690748adb5798b;anchor=swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560">https://archive.softwareheritage.org/swh:1:dir:d327a49bdf0127726927c70c6b216405196653d6;origin=https://github.com/mackelab/sbi;visit=swh:1:snp:ff58cc8344d88bbe9952872597690748adb5798b;anchor=swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560</ext-link></element-citation></ref><ref id="bib5"><element-citation publication-type="software"><person-group person-group-type="author"><name><surname>Boelts</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2022">2022b</year><data-title>Mixed neural likelihood estimation for models of decision-making</data-title><version designator="swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321">swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321</version><source>Software Heritage</source><ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:e20ac3b3d5324d29f262cc52388b3916526d2518;origin=https://github.com/mackelab/mnle-for-ddms;visit=swh:1:snp:b9707619efaa06519bb97d54db256cc99f78df3f;anchor=swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321">https://archive.softwareheritage.org/swh:1:dir:e20ac3b3d5324d29f262cc52388b3916526d2518;origin=https://github.com/mackelab/mnle-for-ddms;visit=swh:1:snp:b9707619efaa06519bb97d54db256cc99f78df3f;anchor=swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321</ext-link></element-citation></ref><ref id="bib6"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Brooks</surname><given-names>S</given-names></name><name><surname>Gelman</surname><given-names>A</given-names></name><name><surname>Jones</surname><given-names>G</given-names></name><name><surname>Meng</surname><given-names>XL</given-names></name></person-group><year iso-8601-date="2011">2011</year><chapter-title>MCMC Using Hamiltonian Dynamics</chapter-title><person-group person-group-type="editor"><name><surname>Brooks</surname><given-names>S</given-names></name></person-group><source>Handbook of Markov Chain Monte Carlo</source><publisher-name>Elsevier</publisher-name><fpage>1</fpage><lpage>2</lpage><pub-id pub-id-type="doi">10.1201/b10905</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chandrasekaran</surname><given-names>C</given-names></name><name><surname>Hawkins</surname><given-names>GE</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>ChaRTr: An R toolbox for modeling choices and response times in decision-making tasks</article-title><source>Journal of Neuroscience Methods</source><volume>328</volume><elocation-id>108432</elocation-id><pub-id pub-id-type="doi">10.1016/j.jneumeth.2019.108432</pub-id><pub-id pub-id-type="pmid">31586868</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Churchland</surname><given-names>PS</given-names></name><name><surname>Sejnowski</surname><given-names>TJ</given-names></name></person-group><year iso-8601-date="1988">1988</year><article-title>Perspectives on cognitive neuroscience</article-title><source>Science</source><volume>242</volume><fpage>741</fpage><lpage>745</lpage><pub-id pub-id-type="doi">10.1126/science.3055294</pub-id><pub-id pub-id-type="pmid">3055294</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cook</surname><given-names>SR</given-names></name><name><surname>Gelman</surname><given-names>A</given-names></name><name><surname>Rubin</surname><given-names>DB</given-names></name></person-group><year iso-8601-date="2006">2006</year><article-title>Validation of software for bayesian models using posterior quantiles</article-title><source>Journal of Computational and Graphical Statistics</source><volume>15</volume><fpage>675</fpage><lpage>692</lpage><pub-id pub-id-type="doi">10.1198/106186006X136976</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cranmer</surname><given-names>K</given-names></name><name><surname>Brehmer</surname><given-names>J</given-names></name><name><surname>Louppe</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>The frontier of simulation-based inference</article-title><source>PNAS</source><volume>117</volume><fpage>30055</fpage><lpage>30062</lpage><pub-id pub-id-type="doi">10.1073/pnas.1912789117</pub-id><pub-id pub-id-type="pmid">32471948</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Dax</surname><given-names>M</given-names></name><name><surname>Green</surname><given-names>SR</given-names></name><name><surname>Gair</surname><given-names>J</given-names></name><name><surname>Deistler</surname><given-names>M</given-names></name><name><surname>Schölkopf</surname><given-names>B</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2022">2022</year><article-title>Group equivariant neural posterior estimation</article-title><conf-name>In International Conference on Learning Representations</conf-name></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Drugowitsch</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Fast and accurate Monte Carlo sampling of first-passage times from Wiener diffusion models</article-title><source>Scientific Reports</source><volume>6</volume><fpage>1</fpage><lpage>13</lpage><pub-id pub-id-type="doi">10.1038/srep20490</pub-id><pub-id pub-id-type="pmid">26864391</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Durkan</surname><given-names>C</given-names></name><name><surname>Bekasov</surname><given-names>A</given-names></name><name><surname>Murray</surname><given-names>I</given-names></name><name><surname>Papamakarios</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Neural spline flows</article-title><conf-name>Advances in Neural Information Processing Systems</conf-name><fpage>7511</fpage><lpage>7522</lpage></element-citation></ref><ref id="bib14"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Durkan</surname><given-names>C</given-names></name><name><surname>Murray</surname><given-names>I</given-names></name><name><surname>Papamakarios</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>On contrastive learning for likelihood-free inference</article-title><conf-name>In International Conference on Machine Learning</conf-name><fpage>2771</fpage><lpage>2781</lpage></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Feltgen</surname><given-names>Q</given-names></name><name><surname>Daunizeau</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>An overcomplete approach to fitting drift-diffusion decision models to trial-by-trial data</article-title><source>Frontiers in Artificial Intelligence</source><volume>4</volume><elocation-id>531316</elocation-id><pub-id pub-id-type="doi">10.3389/frai.2021.531316</pub-id><pub-id pub-id-type="pmid">33898982</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fengler</surname><given-names>A</given-names></name><name><surname>Govindarajan</surname><given-names>LN</given-names></name><name><surname>Chen</surname><given-names>T</given-names></name><name><surname>Frank</surname><given-names>MJ</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Likelihood approximation networks (LANs) for fast inference of simulation models in cognitive neuroscience</article-title><source>eLife</source><volume>10</volume><elocation-id>e65074</elocation-id><pub-id pub-id-type="doi">10.7554/eLife.65074</pub-id><pub-id pub-id-type="pmid">33821788</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Glöckler</surname><given-names>M</given-names></name><name><surname>Deistler</surname><given-names>M</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Variational methods for simulation-based inference</article-title><conf-name>In International Conference on Learning Representations</conf-name></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gonçalves</surname><given-names>PJ</given-names></name><name><surname>Lueckmann</surname><given-names>JM</given-names></name><name><surname>Deistler</surname><given-names>M</given-names></name><name><surname>Nonnenmacher</surname><given-names>M</given-names></name><name><surname>Öcal</surname><given-names>K</given-names></name><name><surname>Bassetto</surname><given-names>G</given-names></name><name><surname>Chintaluri</surname><given-names>C</given-names></name><name><surname>Podlaski</surname><given-names>WF</given-names></name><name><surname>Haddad</surname><given-names>SA</given-names></name><name><surname>Vogels</surname><given-names>TP</given-names></name><name><surname>Greenberg</surname><given-names>DS</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Training deep neural density estimators to identify mechanistic models of neural dynamics</article-title><source>eLife</source><volume>9</volume><elocation-id>e56261</elocation-id><pub-id pub-id-type="doi">10.7554/eLife.56261</pub-id><pub-id pub-id-type="pmid">32940606</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Greenberg</surname><given-names>D</given-names></name><name><surname>Nonnenmacher</surname><given-names>M</given-names></name><name><surname>Macke</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Automatic posterior transformation for likelihood-free inference</article-title><conf-name>In Proceedings of the 36th International Conference on Machine Learning of Proceedings of Machine Learning Research</conf-name><fpage>2404</fpage><lpage>2414</lpage></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gutmann</surname><given-names>MU</given-names></name><name><surname>Corander</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Bayesian optimization for likelihood-free inference of simulator-based statistical models</article-title><source>The Journal of Machine Learning Research</source><volume>17</volume><fpage>4256</fpage><lpage>4302</lpage></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname><given-names>CR</given-names></name><name><surname>Millman</surname><given-names>KJ</given-names></name><name><surname>van der Walt</surname><given-names>SJ</given-names></name><name><surname>Gommers</surname><given-names>R</given-names></name><name><surname>Virtanen</surname><given-names>P</given-names></name><name><surname>Cournapeau</surname><given-names>D</given-names></name><name><surname>Wieser</surname><given-names>E</given-names></name><name><surname>Taylor</surname><given-names>J</given-names></name><name><surname>Berg</surname><given-names>S</given-names></name><name><surname>Smith</surname><given-names>NJ</given-names></name><name><surname>Kern</surname><given-names>R</given-names></name><name><surname>Picus</surname><given-names>M</given-names></name><name><surname>Hoyer</surname><given-names>S</given-names></name><name><surname>van Kerkwijk</surname><given-names>MH</given-names></name><name><surname>Brett</surname><given-names>M</given-names></name><name><surname>Haldane</surname><given-names>A</given-names></name><name><surname>Del Río</surname><given-names>JF</given-names></name><name><surname>Wiebe</surname><given-names>M</given-names></name><name><surname>Peterson</surname><given-names>P</given-names></name><name><surname>Gérard-Marchant</surname><given-names>P</given-names></name><name><surname>Sheppard</surname><given-names>K</given-names></name><name><surname>Reddy</surname><given-names>T</given-names></name><name><surname>Weckesser</surname><given-names>W</given-names></name><name><surname>Abbasi</surname><given-names>H</given-names></name><name><surname>Gohlke</surname><given-names>C</given-names></name><name><surname>Oliphant</surname><given-names>TE</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Array programming with NumPy</article-title><source>Nature</source><volume>585</volume><fpage>357</fpage><lpage>362</lpage><pub-id pub-id-type="doi">10.1038/s41586-020-2649-2</pub-id><pub-id pub-id-type="pmid">32939066</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hawkins</surname><given-names>GE</given-names></name><name><surname>Forstmann</surname><given-names>BU</given-names></name><name><surname>Wagenmakers</surname><given-names>EJ</given-names></name><name><surname>Ratcliff</surname><given-names>R</given-names></name><name><surname>Brown</surname><given-names>SD</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Revisiting the evidence for collapsing boundaries and urgency signals in perceptual decision-making</article-title><source>The Journal of Neuroscience</source><volume>35</volume><fpage>2476</fpage><lpage>2484</lpage><pub-id pub-id-type="doi">10.1523/JNEUROSCI.2410-14.2015</pub-id><pub-id pub-id-type="pmid">25673842</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heathcote</surname><given-names>A</given-names></name><name><surname>Lin</surname><given-names>YS</given-names></name><name><surname>Reynolds</surname><given-names>A</given-names></name><name><surname>Strickland</surname><given-names>L</given-names></name><name><surname>Gretton</surname><given-names>M</given-names></name><name><surname>Matzke</surname><given-names>D</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Dynamic models of choice</article-title><source>Behavior Research Methods</source><volume>51</volume><fpage>961</fpage><lpage>985</lpage><pub-id pub-id-type="doi">10.3758/s13428-018-1067-y</pub-id><pub-id pub-id-type="pmid">29959755</pub-id></element-citation></ref><ref id="bib24"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Hermans</surname><given-names>J</given-names></name><name><surname>Begy</surname><given-names>V</given-names></name><name><surname>Louppe</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Likelihood-free mcmc with approximate likelihood ratios</article-title><conf-name>In Proceedings of the 37th International Conference on Machine Learning of Proceedings of Machine Learning Research</conf-name></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hoffman</surname><given-names>MD</given-names></name><name><surname>Gelman</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2014">2014</year><article-title>The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo</article-title><source>Journal of Machine Learning Research: JMLR</source><volume>15</volume><fpage>1593</fpage><lpage>1623</lpage></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Järvenpää</surname><given-names>M</given-names></name><name><surname>Gutmann</surname><given-names>MU</given-names></name><name><surname>Pleska</surname><given-names>A</given-names></name><name><surname>Vehtari</surname><given-names>A</given-names></name><name><surname>Marttinen</surname><given-names>P</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Efficient acquisition rules for model-based approximate bayesian computation</article-title><source>Bayesian Analysis</source><volume>14</volume><fpage>595</fpage><lpage>622</lpage><pub-id pub-id-type="doi">10.1214/18-BA1121</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kangasrääsiö</surname><given-names>A</given-names></name><name><surname>Jokinen</surname><given-names>JP</given-names></name><name><surname>Oulasvirta</surname><given-names>A</given-names></name><name><surname>Howes</surname><given-names>A</given-names></name><name><surname>Kaski</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Parameter inference for computational cognitive models with approximate bayesian computation</article-title><source>Cognitive Science</source><volume>43</volume><elocation-id>e12738</elocation-id><pub-id pub-id-type="doi">10.1111/cogs.12738</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Kingma</surname><given-names>DP</given-names></name><name><surname>Ba</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Adam: A method for stochastic optimization</article-title><conf-name>In Proceedings of the 3rd International Conference on Learning Representations, ICLR</conf-name></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>MD</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>Three case studies in the Bayesian analysis of cognitive models</article-title><source>Psychonomic Bulletin & Review</source><volume>15</volume><fpage>1</fpage><lpage>15</lpage><pub-id pub-id-type="doi">10.3758/pbr.15.1.1</pub-id><pub-id pub-id-type="pmid">18605474</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>MD</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>How cognitive modeling can benefit from hierarchical Bayesian models</article-title><source>Journal of Mathematical Psychology</source><volume>55</volume><fpage>1</fpage><lpage>7</lpage><pub-id pub-id-type="doi">10.1016/j.jmp.2010.08.013</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>MD</given-names></name><name><surname>Wagenmakers</surname><given-names>EJ</given-names></name></person-group><year iso-8601-date="2014">2014</year><source>Bayesian Cognitive Modeling</source><publisher-name>Cambridge university press</publisher-name><pub-id pub-id-type="doi">10.1017/CBO9781139087759</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>HS</given-names></name><name><surname>Betts</surname><given-names>S</given-names></name><name><surname>Anderson</surname><given-names>JR</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Learning problem-solving rules as search through a hypothesis space</article-title><source>Cognitive Science</source><volume>40</volume><fpage>1036</fpage><lpage>1079</lpage><pub-id pub-id-type="doi">10.1111/cogs.12275</pub-id><pub-id pub-id-type="pmid">26292648</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Lopez-Paz</surname><given-names>D</given-names></name><name><surname>Oquab</surname><given-names>M</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Revisiting classifier two-sample tests</article-title><conf-name>In 5th International Conference on Learning Representations, ICLR</conf-name></element-citation></ref><ref id="bib34"><element-citation publication-type="preprint"><person-group person-group-type="author"><name><surname>Lueckmann</surname><given-names>JM</given-names></name><name><surname>Goncalves</surname><given-names>PJ</given-names></name><name><surname>Bassetto</surname><given-names>G</given-names></name><name><surname>Öcal</surname><given-names>K</given-names></name><name><surname>Nonnenmacher</surname><given-names>M</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>Flexible Statistical Inference for Mechanistic Models of Neural Dynamics</article-title><source>arXiv</source><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1711.01861">https://arxiv.org/abs/1711.01861</ext-link></element-citation></ref><ref id="bib35"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Lueckmann</surname><given-names>JM</given-names></name><name><surname>Bassetto</surname><given-names>G</given-names></name><name><surname>Karaletsos</surname><given-names>T</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Likelihood-free inference with emulator networks</article-title><conf-name>In Proceedings of The 1st Symposium on Advances in Approximate Bayesian Inference of Proceedings of Machine Learning Research</conf-name><fpage>32</fpage><lpage>53</lpage></element-citation></ref><ref id="bib36"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Lueckmann</surname><given-names>JM</given-names></name><name><surname>Boelts</surname><given-names>J</given-names></name><name><surname>Greenberg</surname><given-names>D</given-names></name><name><surname>Goncalves</surname><given-names>P</given-names></name><name><surname>Macke</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Benchmarking simulation-based inference</article-title><conf-name>Proceedings of The 24th International Conference on Artificial Intelligence and Statistics of Proceedings of Machine Learning Research</conf-name><fpage>343</fpage><lpage>351</lpage></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McClelland</surname><given-names>JL</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>The place of modeling in cognitive science</article-title><source>Topics in Cognitive Science</source><volume>1</volume><fpage>11</fpage><lpage>38</lpage><pub-id pub-id-type="doi">10.1111/j.1756-8765.2008.01003.x</pub-id><pub-id pub-id-type="pmid">25164798</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Navarro</surname><given-names>DJ</given-names></name><name><surname>Fuss</surname><given-names>IG</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Fast and accurate calculations for first-passage times in Wiener diffusion models</article-title><source>Journal of Mathematical Psychology</source><volume>53</volume><fpage>222</fpage><lpage>230</lpage><pub-id pub-id-type="doi">10.1016/j.jmp.2009.02.003</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neal</surname><given-names>RM</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Slice sampling</article-title><source>The Annals of Statistics</source><volume>31</volume><elocation-id>e62461</elocation-id><pub-id pub-id-type="doi">10.1214/aos/1056562461</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname><given-names>VMH</given-names></name><name><surname>Nott</surname><given-names>DJ</given-names></name><name><surname>Tran</surname><given-names>M-N</given-names></name><name><surname>Sisson</surname><given-names>SA</given-names></name><name><surname>Drovandi</surname><given-names>CC</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>Variational Bayes with synthetic likelihood</article-title><source>Statistics and Computing</source><volume>28</volume><fpage>971</fpage><lpage>988</lpage><pub-id pub-id-type="doi">10.1007/s11222-017-9773-3</pub-id></element-citation></ref><ref id="bib41"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Palestro</surname><given-names>JJ</given-names></name><name><surname>Sederberg</surname><given-names>PB</given-names></name><name><surname>Osth</surname><given-names>AF</given-names></name><name><surname>Van Zandt</surname><given-names>T</given-names></name><name><surname>Turner</surname><given-names>BM</given-names></name></person-group><year iso-8601-date="2009">2009</year><source>Likelihood-Free Methods for Cognitive Science</source><publisher-loc>Cham</publisher-loc><publisher-name>Springer</publisher-name><pub-id pub-id-type="doi">10.1007/978-3-319-72425-6</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Palestro</surname><given-names>JJ</given-names></name><name><surname>Weichart</surname><given-names>E</given-names></name><name><surname>Sederberg</surname><given-names>PB</given-names></name><name><surname>Turner</surname><given-names>BM</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>Some task demands induce collapsing bounds: Evidence from a behavioral analysis</article-title><source>Psychonomic Bulletin & Review</source><volume>25</volume><fpage>1225</fpage><lpage>1248</lpage><pub-id pub-id-type="doi">10.3758/s13423-018-1479-9</pub-id><pub-id pub-id-type="pmid">29845433</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="software"><person-group person-group-type="author"><collab>pandas development team</collab></person-group><year iso-8601-date="2020">2020</year><data-title>Pandas-dev/pandas: pandas</data-title><version designator="6e1a040">6e1a040</version><source>Github</source><ext-link ext-link-type="uri" xlink:href="https://github.com/pandas-dev/pandas">https://github.com/pandas-dev/pandas</ext-link></element-citation></ref><ref id="bib44"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Papamakarios</surname><given-names>G</given-names></name><name><surname>Murray</surname><given-names>I</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation</article-title><conf-name>In Advances in Neural Information Processing Systems</conf-name></element-citation></ref><ref id="bib45"><element-citation publication-type="preprint"><person-group person-group-type="author"><name><surname>Papamakarios</surname><given-names>G</given-names></name><name><surname>Nalisnick</surname><given-names>E</given-names></name><name><surname>Rezende</surname><given-names>DJ</given-names></name><name><surname>Mohamed</surname><given-names>S</given-names></name><name><surname>Lakshminarayanan</surname><given-names>B</given-names></name></person-group><year iso-8601-date="2019">2019a</year><article-title>Normalizing Flows for Probabilistic Modeling and Inference</article-title><source>arXiv</source><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1912.02762">https://arxiv.org/abs/1912.02762</ext-link></element-citation></ref><ref id="bib46"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Papamakarios</surname><given-names>G</given-names></name><name><surname>Sterratt</surname><given-names>D</given-names></name><name><surname>Murray</surname><given-names>I</given-names></name></person-group><year iso-8601-date="2019">2019b</year><article-title>Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows</article-title><conf-name>In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) of Proceedings of Machine Learning Research</conf-name><fpage>837</fpage><lpage>848</lpage></element-citation></ref><ref id="bib47"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Paszke</surname><given-names>A</given-names></name><name><surname>Gross</surname><given-names>S</given-names></name><name><surname>Massa</surname><given-names>F</given-names></name><name><surname>Lerer</surname><given-names>A</given-names></name><name><surname>Bradbury</surname><given-names>J</given-names></name><name><surname>Chanan</surname><given-names>G</given-names></name><name><surname>Killeen</surname><given-names>T</given-names></name><name><surname>Lin</surname><given-names>Z</given-names></name><name><surname>Gimelshein</surname><given-names>N</given-names></name><name><surname>Antiga</surname><given-names>L</given-names></name><name><surname>Desmaison</surname><given-names>A</given-names></name><name><surname>Kopf</surname><given-names>A</given-names></name><name><surname>Yang</surname><given-names>E</given-names></name><name><surname>DeVito</surname><given-names>Z</given-names></name><name><surname>Raison</surname><given-names>M</given-names></name><name><surname>Tejani</surname><given-names>A</given-names></name><name><surname>Chilamkurthy</surname><given-names>S</given-names></name><name><surname>Steiner</surname><given-names>B</given-names></name><name><surname>Fang</surname><given-names>L</given-names></name><name><surname>Bai</surname><given-names>J</given-names></name><name><surname>Chintala</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Pytorch: An imperative style, high-performance deep learning library</article-title><conf-name>In Advances in Neural Information Processing Systems</conf-name><fpage>8024</fpage><lpage>8035</lpage></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Patil</surname><given-names>U</given-names></name><name><surname>Hanne</surname><given-names>S</given-names></name><name><surname>Burchert</surname><given-names>F</given-names></name><name><surname>De Bleser</surname><given-names>R</given-names></name><name><surname>Vasishth</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2016">2016</year><article-title>A computational evaluation of sentence processing deficits in aphasia</article-title><source>Cognitive Science</source><volume>40</volume><fpage>5</fpage><lpage>50</lpage><pub-id pub-id-type="doi">10.1111/cogs.12250</pub-id><pub-id pub-id-type="pmid">26016698</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname><given-names>F</given-names></name><name><surname>Varoquaux</surname><given-names>G</given-names></name><name><surname>Gramfort</surname><given-names>A</given-names></name><name><surname>Michel</surname><given-names>V</given-names></name><name><surname>Thirion</surname><given-names>B</given-names></name><name><surname>Grisel</surname><given-names>O</given-names></name><name><surname>Blondel</surname><given-names>M</given-names></name><name><surname>Prettenhofer</surname><given-names>P</given-names></name><name><surname>Weiss</surname><given-names>R</given-names></name><name><surname>Dubourg</surname><given-names>V</given-names></name><name><surname>Vanderplas</surname><given-names>J</given-names></name><name><surname>Passos</surname><given-names>A</given-names></name><name><surname>Cournapeau</surname><given-names>D</given-names></name><name><surname>Brucher</surname><given-names>M</given-names></name><name><surname>Perrot</surname><given-names>M</given-names></name><name><surname>Duchesnay</surname><given-names>E</given-names></name></person-group><year iso-8601-date="2011">2011</year><article-title>Scikit-learn: Machine learning in python</article-title><source>Journal of Machine Learning Research</source><volume>12</volume><fpage>2825</fpage><lpage>2830</lpage></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Price</surname><given-names>LF</given-names></name><name><surname>Drovandi</surname><given-names>CC</given-names></name><name><surname>Lee</surname><given-names>A</given-names></name><name><surname>Nott</surname><given-names>DJ</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>Bayesian Synthetic Likelihood</article-title><source>Journal of Computational and Graphical Statistics</source><volume>27</volume><fpage>1</fpage><lpage>11</lpage><pub-id pub-id-type="doi">10.1080/10618600.2017.1302882</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Priddle</surname><given-names>JW</given-names></name><name><surname>Sisson</surname><given-names>SA</given-names></name><name><surname>Frazier</surname><given-names>DT</given-names></name><name><surname>Turner</surname><given-names>I</given-names></name><name><surname>Drovandi</surname><given-names>C</given-names></name></person-group><year iso-8601-date="2022">2022</year><article-title>Efficient bayesian synthetic likelihood with whitening transformations</article-title><source>Journal of Computational and Graphical Statistics</source><volume>31</volume><fpage>50</fpage><lpage>63</lpage><pub-id pub-id-type="doi">10.1080/10618600.2021.1979012</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rackauckas</surname><given-names>C</given-names></name><name><surname>Nie</surname><given-names>Q</given-names></name></person-group><year iso-8601-date="2017">2017</year><article-title>DifferentialEquations.jl – a performant and feature-rich ecosystem for solving differential equations in julia</article-title><source>Journal of Open Research Software</source><volume>5</volume><elocation-id>15</elocation-id><pub-id pub-id-type="doi">10.5334/jors.151</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Radev</surname><given-names>ST</given-names></name><name><surname>Mertens</surname><given-names>UK</given-names></name><name><surname>Voss</surname><given-names>A</given-names></name><name><surname>Ardizzone</surname><given-names>L</given-names></name><name><surname>Kothe</surname><given-names>U</given-names></name></person-group><year iso-8601-date="2022">2022</year><article-title>BayesFlow: learning complex stochastic models with invertible neural networks</article-title><source>IEEE Transactions on Neural Networks and Learning Systems</source><volume>33</volume><fpage>1452</fpage><lpage>1466</lpage><pub-id pub-id-type="doi">10.1109/TNNLS.2020.3042395</pub-id><pub-id pub-id-type="pmid">33338021</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname><given-names>R</given-names></name><name><surname>Rouder</surname><given-names>JN</given-names></name></person-group><year iso-8601-date="1998">1998</year><article-title>Modeling response times for two-choice decisions</article-title><source>Psychological Science</source><volume>9</volume><fpage>347</fpage><lpage>356</lpage><pub-id pub-id-type="doi">10.1111/1467-9280.00067</pub-id></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname><given-names>R</given-names></name><name><surname>McKoon</surname><given-names>G</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>The diffusion decision model: theory and data for two-choice decision tasks</article-title><source>Neural Computation</source><volume>20</volume><fpage>873</fpage><lpage>922</lpage><pub-id pub-id-type="doi">10.1162/neco.2008.12-06-420</pub-id><pub-id pub-id-type="pmid">18085991</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reynolds</surname><given-names>AM</given-names></name><name><surname>Rhodes</surname><given-names>CJ</given-names></name></person-group><year iso-8601-date="2009">2009</year><article-title>The Lévy flight paradigm: random search patterns and mechanisms</article-title><source>Ecology</source><volume>90</volume><fpage>877</fpage><lpage>887</lpage><pub-id pub-id-type="doi">10.1890/08-0153.1</pub-id><pub-id pub-id-type="pmid">19449680</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schad</surname><given-names>DJ</given-names></name><name><surname>Betancourt</surname><given-names>M</given-names></name><name><surname>Vasishth</surname><given-names>S</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Toward a principled Bayesian workflow in cognitive science</article-title><source>Psychological Methods</source><volume>26</volume><fpage>103</fpage><lpage>126</lpage><pub-id pub-id-type="doi">10.1037/met0000275</pub-id><pub-id pub-id-type="pmid">32551748</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shiffrin</surname><given-names>RM</given-names></name><name><surname>Lee</surname><given-names>MD</given-names></name><name><surname>Kim</surname><given-names>W</given-names></name><name><surname>Wagenmakers</surname><given-names>EJ</given-names></name></person-group><year iso-8601-date="2008">2008</year><article-title>A survey of model evaluation approaches with A tutorial on hierarchical bayesian methods</article-title><source>Cognitive Science</source><volume>32</volume><fpage>1248</fpage><lpage>1284</lpage><pub-id pub-id-type="doi">10.1080/03640210802414826</pub-id><pub-id pub-id-type="pmid">21585453</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shinn</surname><given-names>M</given-names></name><name><surname>Lam</surname><given-names>NH</given-names></name><name><surname>Murray</surname><given-names>JD</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>A flexible framework for simulating and fitting generalized drift-diffusion models</article-title><source>eLife</source><volume>9</volume><elocation-id>e56938</elocation-id><pub-id pub-id-type="doi">10.7554/eLife.56938</pub-id><pub-id pub-id-type="pmid">32749218</pub-id></element-citation></ref><ref id="bib60"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Sisson</surname><given-names>SA</given-names></name></person-group><year iso-8601-date="2018">2018</year><chapter-title>Overview of abc</chapter-title><person-group person-group-type="editor"><name><surname>Sisson</surname><given-names>SA</given-names></name><name><surname>Fan</surname><given-names>Y</given-names></name><name><surname>Beaumont</surname><given-names>MA</given-names></name></person-group><source>In Handbook of Approximate Bayesian Computation, Chapter 1</source><publisher-name>CRC Press, Taylor & Francis Group</publisher-name><fpage>1</fpage><lpage>678</lpage><pub-id pub-id-type="doi">10.1201/9781315117195</pub-id></element-citation></ref><ref id="bib61"><element-citation publication-type="preprint"><person-group person-group-type="author"><name><surname>Talts</surname><given-names>S</given-names></name><name><surname>Betancourt</surname><given-names>M</given-names></name><name><surname>Simpson</surname><given-names>D</given-names></name><name><surname>Vehtari</surname><given-names>A</given-names></name><name><surname>Gelman</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>Validating Bayesian Inference Algorithms with Simulation-Based Calibration</article-title><source>arXiv</source><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1804.06788">https://arxiv.org/abs/1804.06788</ext-link></element-citation></ref><ref id="bib62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tejero-Cantero</surname><given-names>A</given-names></name><name><surname>Boelts</surname><given-names>J</given-names></name><name><surname>Deistler</surname><given-names>M</given-names></name><name><surname>Lueckmann</surname><given-names>JM</given-names></name><name><surname>Durkan</surname><given-names>C</given-names></name><name><surname>Gonçalves</surname><given-names>PJ</given-names></name><name><surname>Greenberg</surname><given-names>DS</given-names></name><name><surname>Macke</surname><given-names>JH</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>sbi: A toolkit for simulation-based inference</article-title><source>Journal of Open Source Software</source><volume>5</volume><elocation-id>2505</elocation-id><pub-id pub-id-type="doi">10.21105/joss.02505</pub-id></element-citation></ref><ref id="bib63"><element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Tran</surname><given-names>D</given-names></name><name><surname>Vafa</surname><given-names>K</given-names></name><name><surname>Agrawal</surname><given-names>K</given-names></name><name><surname>Dinh</surname><given-names>L</given-names></name><name><surname>Poole</surname><given-names>B</given-names></name></person-group><year iso-8601-date="2019">2019</year><article-title>Discrete flows: Invertible generative models of discrete data</article-title><conf-name>Advances in Neural Information Processing Systems</conf-name><fpage>14719</fpage><lpage>14728</lpage></element-citation></ref><ref id="bib64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Turner</surname><given-names>BM</given-names></name><name><surname>Van Zandt</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2012">2012</year><article-title>A tutorial on approximate bayesian computation</article-title><source>Journal of Mathematical Psychology</source><volume>56</volume><fpage>69</fpage><lpage>85</lpage><pub-id pub-id-type="doi">10.1016/j.jmp.2012.02.005</pub-id></element-citation></ref><ref id="bib65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Turner</surname><given-names>BM</given-names></name><name><surname>van Maanen</surname><given-names>L</given-names></name><name><surname>Forstmann</surname><given-names>BU</given-names></name></person-group><year iso-8601-date="2015">2015</year><article-title>Informing cognitive abstractions through neuroimaging: the neural drift diffusion model</article-title><source>Psychological Review</source><volume>122</volume><fpage>312</fpage><lpage>336</lpage><pub-id pub-id-type="doi">10.1037/a0038894</pub-id><pub-id pub-id-type="pmid">25844875</pub-id></element-citation></ref><ref id="bib66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Turner</surname><given-names>BM</given-names></name><name><surname>Van Zandt</surname><given-names>T</given-names></name></person-group><year iso-8601-date="2018">2018</year><article-title>Approximating bayesian inference through model simulation</article-title><source>Trends in Cognitive Sciences</source><volume>22</volume><fpage>826</fpage><lpage>840</lpage><pub-id pub-id-type="doi">10.1016/j.tics.2018.06.003</pub-id><pub-id pub-id-type="pmid">30093313</pub-id></element-citation></ref><ref id="bib67"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Usher</surname><given-names>M</given-names></name><name><surname>McClelland</surname><given-names>JL</given-names></name></person-group><year iso-8601-date="2001">2001</year><article-title>The time course of perceptual choice: the leaky, competing accumulator model</article-title><source>Psychological Review</source><volume>108</volume><fpage>550</fpage><lpage>592</lpage><pub-id pub-id-type="doi">10.1037/0033-295x.108.3.550</pub-id><pub-id pub-id-type="pmid">11488378</pub-id></element-citation></ref><ref id="bib68"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Opheusden</surname><given-names>B</given-names></name><name><surname>Acerbi</surname><given-names>L</given-names></name><name><surname>Ma</surname><given-names>WJ</given-names></name></person-group><year iso-8601-date="2020">2020</year><article-title>Unbiased and efficient log-likelihood estimation with inverse binomial sampling</article-title><source>PLOS Computational Biology</source><volume>16</volume><elocation-id>e1008483</elocation-id><pub-id pub-id-type="doi">10.1371/journal.pcbi.1008483</pub-id><pub-id pub-id-type="pmid">33362195</pub-id></element-citation></ref><ref id="bib69"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Van Rossum</surname><given-names>G</given-names></name><name><surname>Drake</surname><given-names>FL</given-names></name></person-group><year iso-8601-date="1995">1995</year><source>Python Tutorial</source><publisher-name>Centrum Voor Wiskunde En Informatica</publisher-name></element-citation></ref><ref id="bib70"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>von Krause</surname><given-names>M</given-names></name><name><surname>Radev</surname><given-names>ST</given-names></name><name><surname>Voss</surname><given-names>A</given-names></name></person-group><year iso-8601-date="2022">2022</year><article-title>Mental speed is high until age 60 as revealed by analysis of over a million participants</article-title><source>Nature Human Behaviour</source><volume>6</volume><fpage>700</fpage><lpage>708</lpage><pub-id pub-id-type="doi">10.1038/s41562-021-01282-7</pub-id><pub-id pub-id-type="pmid">35177809</pub-id></element-citation></ref><ref id="bib71"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Voss</surname><given-names>A</given-names></name><name><surname>Voss</surname><given-names>J</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>Fast-dm: A free program for efficient diffusion model analysis</article-title><source>Behavior Research Methods</source><volume>39</volume><fpage>767</fpage><lpage>775</lpage><pub-id pub-id-type="doi">10.3758/bf03192967</pub-id><pub-id pub-id-type="pmid">18183889</pub-id></element-citation></ref><ref id="bib72"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wagenmakers</surname><given-names>EJ</given-names></name><name><surname>van der Maas</surname><given-names>HLJ</given-names></name><name><surname>Grasman</surname><given-names>R</given-names></name></person-group><year iso-8601-date="2007">2007</year><article-title>An EZ-diffusion model for response time and accuracy</article-title><source>Psychonomic Bulletin & Review</source><volume>14</volume><fpage>3</fpage><lpage>22</lpage><pub-id pub-id-type="doi">10.3758/bf03194023</pub-id><pub-id pub-id-type="pmid">17546727</pub-id></element-citation></ref><ref id="bib73"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wiecki</surname><given-names>TV</given-names></name><name><surname>Sofer</surname><given-names>I</given-names></name><name><surname>Frank</surname><given-names>MJ</given-names></name></person-group><year iso-8601-date="2013">2013</year><article-title>HDDM: Hierarchical bayesian estimation of the drift-diffusion model in python</article-title><source>Frontiers in Neuroinformatics</source><volume>7</volume><elocation-id>14</elocation-id><pub-id pub-id-type="doi">10.3389/fninf.2013.00014</pub-id><pub-id pub-id-type="pmid">23935581</pub-id></element-citation></ref><ref id="bib74"><element-citation publication-type="preprint"><person-group person-group-type="author"><name><surname>Wiqvist</surname><given-names>S</given-names></name><name><surname>Frellsen</surname><given-names>J</given-names></name><name><surname>Picchini</surname><given-names>U</given-names></name></person-group><year iso-8601-date="2021">2021</year><article-title>Sequential Neural Posterior and Likelihood Approximation</article-title><source>arXiv</source><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2102.06522">https://arxiv.org/abs/2102.06522</ext-link></element-citation></ref><ref id="bib75"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wood</surname><given-names>SN</given-names></name></person-group><year iso-8601-date="2010">2010</year><article-title>Statistical inference for noisy nonlinear ecological dynamic systems</article-title><source>Nature</source><volume>466</volume><fpage>1102</fpage><lpage>1104</lpage><pub-id pub-id-type="doi">10.1038/nature09319</pub-id><pub-id pub-id-type="pmid">20703226</pub-id></element-citation></ref></ref-list><app-group><app id="appendix-1"><title>Appendix 1</title><sec sec-type="appendix" id="s8"><title>Code availability</title><p>We implemented MNLE as part of the open-source package for SBI, sbi, available at <ext-link ext-link-type="uri" xlink:href="https://github.com/mackelab/sbi">https://github.com/mackelab/sbi</ext-link>, (<xref ref-type="bibr" rid="bib4">Boelts, 2022a</xref> copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:d327a49bdf0127726927c70c6b216405196653d6;origin=https://github.com/mackelab/sbi;visit=swh:1:snp:ff58cc8344d88bbe9952872597690748adb5798b;anchor=swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560">swh:1:rev:d72fc6d790285c7779afbbe9a5f6b640691d4560</ext-link>). Code for reproducing the results presented here, and tutorials on how to apply MNLE to other simulators using sbi can be found at <ext-link ext-link-type="uri" xlink:href="https://github.com/mackelab/mnle-for-ddms">https://github.com/mackelab/mnle-for-ddms</ext-link>, (<xref ref-type="bibr" rid="bib5">Boelts, 2022b</xref> copy archived at <ext-link ext-link-type="uri" xlink:href="https://archive.softwareheritage.org/swh:1:dir:e20ac3b3d5324d29f262cc52388b3916526d2518;origin=https://github.com/mackelab/mnle-for-ddms;visit=swh:1:snp:b9707619efaa06519bb97d54db256cc99f78df3f;anchor=swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321">swh:1:rev:5e6cf714c223ec5c414b76ac70f7dc88d4fbd321</ext-link>). The implementation of MNLE relies on packages developed by the Python (<xref ref-type="bibr" rid="bib69">Van Rossum and Drake, 1995</xref>) and Julia (<xref ref-type="bibr" rid="bib2">Bezanson et al., 2017</xref>) communities, including DifferentialEquations.jl (<xref ref-type="bibr" rid="bib52">Rackauckas and Nie, 2017</xref>), DiffModels.jl (<xref ref-type="bibr" rid="bib12">Drugowitsch, 2016</xref>), NumPy (<xref ref-type="bibr" rid="bib21">Harris et al., 2020</xref>), pandas (<xref ref-type="bibr" rid="bib43">pandas development team, 2020</xref>), Pyro (<xref ref-type="bibr" rid="bib3">Bingham et al., 2019</xref>), PyTorch (<xref ref-type="bibr" rid="bib47">Paszke et al., 2019</xref>), sbi (<xref ref-type="bibr" rid="bib62">Tejero-Cantero et al., 2020</xref>), sbibm (<xref ref-type="bibr" rid="bib36">Lueckmann et al., 2021</xref>), and Scikit-learn (<xref ref-type="bibr" rid="bib49">Pedregosa et al., 2011</xref>).</p></sec><sec sec-type="appendix" id="s9"><title>Architecture and training hyperparameters</title><p>For the Bernoulli neural network we used three hidden layers with 10 units each and sigmoid activation functions. For the neural spline flow architecture (<xref ref-type="bibr" rid="bib13">Durkan et al., 2019</xref>), we transformed the reaction time data to the log-domain, used a standard normal base distribution, 2 spline transforms with 5 bins each and conditioning networks with 3 hidden layers and 10 hidden units each, and rectified linear unit activation functions. The neural network training was performed using the sbi package with the following settings: learning rate 0.0005; training batch size 100; 10% of training data as validation data, stop training after 20 epochs without validation loss improvement.</p></sec><sec sec-type="appendix" id="s10"><title>The emulator property of MNLE</title><p>Being based on the neural likelihood estimation framework, MNLE naturally returns an emulator of the simulator that can be sampled to generate synthetic data without running the simulator. We found that the synthetic data generated by MNLE accurately matched the data we obtained by running the DDM simulator (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1</xref>). This has several potential benefits: it can help with evaluating the performance of the density estimator, it enables almost instantaneous data generation (one forward-pass in the neural network) even if the simulator is computationally expensive, and it gives full access to the internals of the emulator, for example, to gradients w.r.t. to data or parameters.</p><p>There is variant of the LAN approach which allows for sampling synthetic data as well: In the ‘Histogram-approach’ (<xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>) LANs are trained with a convolutional neural network (CNN) architecture using likelihood targets in form of two-dimensional empirical histograms. The output of the CNN is a probability distribution over a discretized version of the data space which can, in principle, be sampled to generate synthetic DDM choices and reaction times. However, the accuracy of this emulator property of CNN-LANs is limited by the number of bins used to approximate the continuous data space (e.g., 512 bins for the examples shown in <xref ref-type="bibr" rid="bib16">Fengler et al., 2021</xref>).</p></sec></app></app-group></back><sub-article article-type="editor-report" id="sa0"><front-stub><article-id pub-id-type="doi">10.7554/eLife.77220.sa0</article-id><title-group><article-title>Editor's evaluation</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Wyart</surname><given-names>Valentin</given-names></name><role specific-use="editor">Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/013cjyk83</institution-id><institution>École normale supérieure, PSL University, INSERM</institution></institution-wrap><country>France</country></aff></contrib></contrib-group><related-object id="sa0ro1" object-id-type="id" object-id="10.1101/2021.12.22.473472" link-type="continued-by" xlink:href="https://sciety.org/articles/activity/10.1101/2021.12.22.473472"/></front-stub><body><p>This paper provides a new approach, Mixed Neural Likelihood Estimator (MNLE) to build likelihood emulators for simulation-based models where the likelihood is unavailable. The authors show that the MNLE approach is equally accurate but orders of magnitude more efficient than a recent proposal, likelihood approximation networks (LAN), on two variants of the drift-diffusion model (a widely used model in cognitive neuroscience). This work provides a practical approach for fitting more complex models of behavior and neural activity for which likelihoods are unavailable.</p></body></sub-article><sub-article article-type="decision-letter" id="sa1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.77220.sa1</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Wyart</surname><given-names>Valentin</given-names></name><role>Reviewing Editor</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/013cjyk83</institution-id><institution>École normale supérieure, PSL University, INSERM</institution></institution-wrap><country>France</country></aff></contrib></contrib-group><contrib-group><contrib contrib-type="reviewer"><name><surname>Acerbi</surname><given-names>Luigi</given-names></name><role>Reviewer</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/040af2s02</institution-id><institution>University of Helsinki</institution></institution-wrap><country>Finland</country></aff></contrib><contrib contrib-type="reviewer"><name><surname>Daunizeau</surname><given-names>Jean</given-names></name><role>Reviewer</role><aff><institution-wrap><institution-id institution-id-type="ror">https://ror.org/02vjkv261</institution-id><institution>Inserm</institution></institution-wrap><country>France</country></aff></contrib></contrib-group></front-stub><body><boxed-text id="sa2-box1"><p>Our editorial process produces two outputs: (i) <ext-link ext-link-type="uri" xlink:href="https://sciety.org/articles/activity/10.1101/2021.12.22.473472">public reviews</ext-link> designed to be posted alongside <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/10.1101/2021.12.22.473472v2">the preprint</ext-link> for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.</p></boxed-text><p><bold>Decision letter after peer review:</bold></p><p>Thank you for submitting your article "Flexible and efficient simulation-based inference for models of decision-making" for consideration by <italic>eLife</italic>. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by Valentin Wyart as the Reviewing Editor and Timothy Behrens as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Luigi Acerbi (Reviewer #1); Jean Daunizeau (Reviewer #2).</p><p>The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. As you will see after reading the two reviews (copied at the bottom of this message), the two reviewers have found that the MNLE approach you describe represents a significant advance over LANs. As summarized by Reviewer #1, instead of estimating the likelihood *independently* for each parameter setting and then training a neural network to interpolate between these density estimates, MNLE uses *conditional* density estimation which trains a density network while sharing information across different parameters settings. This approach, and others built on top of it or inspired by it, may have a large impact in the field, even more so since all code has been made available by the authors.</p><p>But while this approach is a priori order of magnitude more efficient than LANs, the reviewers have agreed that the current manuscript does not provide sufficient empirical evidence that it is the case. The reviewers have discussed how in practice the comparison between LANs and MNLEs could be improved to strengthen the manuscript in this respect. Furthermore, the reviewers have also identified limitations of the approach (shared with LANs) that are worth discussing more explicitly in the paper. We hope that you will find the reviews helpful when revising your manuscript. When submitting your revised manuscript, please provide a point-by-point response to the essential revisions that were decided after discussion between the reviewers. The other points made in the separate reviews (provided at the bottom of this message) should also be addressed, but they do not require a point-by-point response.</p><p>Essential revisions:</p><p>1. While the MNLE approach appears much more sample-efficient than the LAN approach, there is not sufficient empirical evidence that it is indeed the case. The main piece of evidence is described in Section 2.2, where the authors show that MNLE learns likelihood function as accurately as LAN, but with only a fraction of the simulation budget. The authors show that a MNLE estimator trained with 10^5 model simulations is as accurate as a LAN estimator trained with 10^11 model simulations. However, the authors did not show that a LAN estimator trained with only 10^5 model simulations yields less accurate results than a MNLE estimator trained with the same simulation budget, or that a MNLE estimator trained with 10^11 simulations yields much better results than a LAN estimator trained with the same simulation budget. A more interpretable comparison of the two approaches is needed here: claiming that MNLE is more efficient than LAN is important, but requires additional work. We do not require the authors to systematically vary the number of model simulations (spanning the 10^5 to 10^11 range), and quantify each method's efficiency in terms of accuracy profile over this range. The authors could compare their MNLE estimator trained with 10^5 model simulations with a LAN estimator trained on 10^8 samples, which is still 1000 times more than their MNLE estimator, and show that the LAN estimator performs much worse than their MNLE estimator. Note that there is an additional technical caveat here, in that training LANs depends on an additional degree of freedom = how many parameters / how many samples per parameter. The authors should keep a sensible split. The current split appears to be 10^6 parameters and 10^5 samples per parameter. When going down to 10^8, it would make sense to keep a similar ratio.</p><p>2. Additional "accuracy" metrics should be considered when comparing MNLE and LAN estimators. These metrics may include likelihood approximation accuracy, parameter estimation accuracy and estimation uncertainty accuracy. The last two are important in practice, because errors in likelihood approximations may have a very small impact on parameter estimation. Regarding parameter estimation accuracy, the metric used in the manuscript (the "absolute difference in posterior sample mean normalized by reference posterior standard deviation") may not be the most informative. Indeed, significant differences between MNLEs and LANs in this metric may have no practical consequence whatsoever in terms of absolute parameter estimation errors. This is what Figure 4A seems to indicate. It could be useful to use metrics that relate to parameter recovery, e.g., parameter estimation error or pairwise parameter confusion. LAN estimators may show more pairwise parameter confusion than MNLE estimators, for example.</p><p>3. The following limitations of MNLE estimators (limitations shared with LAN estimators) should be discussed more explicitly in the manuscript:</p><p>– The drift-diffusion model used to benchmark the approach has a very low-dimensional observation structure (one binary observation + one continuous observation per trial). This limitation is not necessarily a problem because many models in cognitive neuroscience have a very low-dimensional observation structure, but it is worth mentioning.</p><p>– The approach described works for i.i.d. data. Any additional structure/dependence (e.g., adding parameters to characterize the stimulus shown in the trial) effectively increases the dimensionality of the likelihood approximation the network needs to learn.</p><p>– The manuscript does not sufficiently discuss nor explore the scalability of the MNLE approach. Given the previous related work by some of the authors, that has shown remarkable results in this respect, it would be useful to discuss the scalability of MNLEs in the discussion.</p><p>– Like any neural-network based approach, selecting the hyperparameters and architecture of the network requires either prior knowledge and/or brute force. And it is currently unclear how in practice the MNLE approach can be applied for different models. Importantly, even if training a MNLE estimator *given the hyperparameters* might require a small number of simulations (10^5), we need to account for the additional cost of finding the correct hyperparameters for training the emulator (presumably, requiring an exploration of at least 10-100 hyperparameter settings).</p><p><italic>Reviewer #1 (Recommendations for the authors):</italic></p><p>I already discussed an earlier version of the manuscript with the authors, so I don't really have much to add to my previous comments and to my public comments (i.e., there are a couple of limitations that the authors could address a bit more explicitly, like dimensionality of the method and the cost of hyperparameter tuning).</p><p><italic>Reviewer #2 (Recommendations for the authors):</italic></p><p>Let me now explain why I think that authors' main claim (namely: that MNLE is more efficient than LAN) may not be strongly supported by the results reported in the current version of the manuscript.</p><p>The main piece of evidence is summarized in Section 2.2, where authors claim that MNLE learns likelihood function as accurately as LAN, but with only a fraction of the simulation budget. In brief, authors showed that a MNLE estimator trained with 10^5 model simulations is as accurate as a pre-trained version of the LAN method for DDMs (which used 10^11 model simulations). The problem here, is that they did not show that a LAN estimator trained with only 10^5 model simulations yields less accurate results. Or that MNLE trained with 10^11 simulations yields much better results. More generally, what is needed here is a quantitative comparison of the efficiency of the method. One would expect that, for both MNLE and LAN, increasing the number of model simulations increases the accuracy of likelihood approximation. However, this probably follows the law of diminishing marginal utility (i.e. using 10^11 simulations may bring little advantage when compared to e.g. 10^10) (:) Hence, efficiency here should be measured in terms of the speed at which the results accuracy increases. In other terms, claiming that MNLE is more efficient than LAN is important, but requires more work than what is done here. Authors should systematically vary the number of model simulations (spanning, e.g. the 10^5 to 10^11 range), and quantify each method's efficiency in terms of the accuracy profile over this range.</p><p>Now, this analysis should be performed with different "accuracy" metrics. In particular, authors may measure likelihood approximation accuracy, parameter estimation accuracy and estimation uncertainty accuracy. The latter are critical, because if errors in likelihood approximations may have very small impact on parameter estimation. I note that, w.r.t. parameter estimation accuracy and estimation uncertainty accuracy, I don't think authors chose very informative metrics. For example, the former is defined as the "absolute difference in posterior sample mean normalized by reference posterior standard deviation". If I understood correctly, this metric may show significant differences between MNLE and LAN that would have no practical consequence whatsoever, given that methods would be very similar in terms of their absolute parameter estimation errors. In fact, this is what Figure 4A seems to indicate. I would rather suggest to use accuracy metrics that practically "mean something" for parameter recovery. Measuring parameter estimation error is a possibility (although one may then conclude that MNLE brings no clear advantage when compared to LAN, cf. Figure 4A). But there are other possibilities. For example, likelihood approximation errors may induce pairwise parameter confusions. In turn, for a small simulation budget, LAN estimators may show more pairwise parameter confusion than MNLE. Note: to quantify pairwise parameter confusion in the context of DDM, authors may take inspiration from Feltgen 2021 (or any other metric of their choice).</p></body></sub-article><sub-article article-type="reply" id="sa2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.77220.sa2</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><disp-quote content-type="editor-comment"><p>The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. As you will see after reading the two reviews (copied at the bottom of this message), the two reviewers have found that the MNLE approach you describe represents a significant advance over LANs. As summarized by Reviewer #1, instead of estimating the likelihood *independently* for each parameter setting and then training a neural network to interpolate between these density estimates, MNLE uses *conditional* density estimation which trains a density network while sharing information across different parameters settings. This approach, and others built on top of it or inspired by it, may have a large impact in the field, even more so since all code has been made available by the authors.</p><p>But while this approach is a priori order of magnitude more efficient than LANs, the reviewers have agreed that the current manuscript does not provide sufficient empirical evidence that it is the case. The reviewers have discussed how in practice the comparison between LANs and MNLEs could be improved to strengthen the manuscript in this respect. Furthermore, the reviewers have also identified limitations of the approach (shared with LANs) that are worth discussing more explicitly in the paper. We hope that you will find the reviews helpful when revising your manuscript. When submitting your revised manuscript, please provide a point-by-point response to the essential revisions that were decided after discussion between the reviewers. The other points made in the separate reviews (provided at the bottom of this message) should also be addressed, but they do not require a point-by-point response.</p></disp-quote><p>We appreciate that the reviewers view our approach as a significant advance over LANs, and with a potentially large impact on the field. We agree that, in the previous version of the manuscript, there was a lack of a direct comparison between MNLE and LANs at a similar simulation budget. We now provide results for these experiments below and show that MNLE is significantly more accurate than LANs across several simulation budgets. We now also provide further discussions regarding limitations of the MNLE.</p><disp-quote content-type="editor-comment"><p>Essential revisions:</p><p>1. While the MNLE approach appears much more sample-efficient than the LAN approach, there is not sufficient empirical evidence that it is indeed the case. The main piece of evidence is described in Section 2.2, where the authors show that MNLE learns likelihood function as accurately as LAN, but with only a fraction of the simulation budget. The authors show that a MNLE estimator trained with 10^5 model simulations is as accurate as a LAN estimator trained with 10^11 model simulations. However, the authors did not show that a LAN estimator trained with only 10^5 model simulations yields less accurate results than a MNLE estimator trained with the same simulation budget, or that a MNLE estimator trained with 10^11 simulations yields much better results than a LAN estimator trained with the same simulation budget. A more interpretable comparison of the two approaches is needed here: claiming that MNLE is more efficient than LAN is important, but requires additional work. We do not require the authors to systematically vary the number of model simulations (spanning the 10^5 to 10^11 range), and quantify each method's efficiency in terms of accuracy profile over this range. The authors could compare their MNLE estimator trained with 10^5 model simulations with a LAN estimator trained on 10^8 samples, which is still 1000 times more than their MNLE estimator, and show that the LAN estimator performs much worse than their MNLE estimator. Note that there is an additional technical caveat here, in that training LANs depends on an additional degree of freedom = how many parameters / how many samples per parameter. The authors should keep a sensible split. The current split appears to be 10^6 parameters and 10^5 samples per parameter. When going down to 10^8, it would make sense to keep a similar ratio.</p></disp-quote><p>The revised manuscript includes additional experiments to show that MNLE is more simulation-efficient than LANs, as requested, with updated main figures. Note that the authors of the LAN paper recently provided an updated PyTorch implementation of LAN, which now made it feasible to retrain LANs at smaller budgets locally, instead of having to exclusively rely on the pre-trained network at a budget of 10<sup>11</sup>, as we had done previously.</p><p>We retrained LANs for budgets of 10<sup>5</sup> and 10<sup>8</sup> simulations. For the 10<sup>5</sup> budget we used 10<sup>2</sup> parameter samples and 10<sup>3</sup> samples per parameter, respectively (in contrast to the split used in the LAN paper, we used more samples per parameter than parameter samples, which resulted in slightly better performance than the other way around). For the 10<sup>8</sup> budget we used an equal split of 10<sup>4</sup> and 10<sup>4</sup>.</p><p>As neural network-training settings, we used the defaults provided in the new LAN implementation. As training data, we used the data from the same DDM simulator as used for MNLE, except for the 10<sup>11</sup> LAN budget for which simulation and training was computationally too expensive so that we used the updated pre-trained network (the training data still came from the same distribution (simple DDM), just not from our own simulator).</p><p>We found that MNLE with a 10<sup>5</sup> budget (henceforth referred to as MNLE<sup>5</sup>) significantly outperformed LAN trained with 10<sup>5</sup> and 10<sup>8</sup> budgets (LAN<sup>5</sup> and LAN<sup>8</sup>, respectively) on all metrics (see updated Figure 2 below). When compared to LAN<sup>11</sup>, MNLE<sup>5</sup> was more accurate on all but one metric, the Huber loss on the log-likelihoods (LAN’s training loss). Note that the numbers for MNLE<sup>5</sup> and LAN<sup>11</sup> differ slightly from our initial submission because MNLE performance is now averaged across ten random neural network initializations (a subset of which outperformed the single pre-trained LAN<sup>11</sup> network) and because we used new implementations of LAN (provided by the authors) and for MNLE (now fully integrated into the <italic>sbi</italic> toolbox).</p><p>We updated Figure 2 with the new results and the following changes: we added the new results based on the smaller simulation budgets for LANs as lighter shades of orange; we show the metrics results as bar plots with standard error of the mean (SEM) error bars instead of boxplots, to improve readability; we replace panels D (evaluation time) and E (budgets) with Huber loss and MSE calculated on log-likelihoods (shown before in panel B and C) and show Huber loss and MSE on calculated on likelihoods in panel B and C. The main text is updated accordingly (not copied here due to length). We also still mention the results on evaluation time in the main text, and cover inference time of LAN and MNLE in general in the discussion as well.</p><disp-quote content-type="editor-comment"><p>2. Additional "accuracy" metrics should be considered when comparing MNLE and LAN estimators. These metrics may include likelihood approximation accuracy, parameter estimation accuracy and estimation uncertainty accuracy. The last two are important in practice, because errors in likelihood approximations may have a very small impact on parameter estimation. Regarding parameter estimation accuracy, the metric used in the manuscript (the "absolute difference in posterior sample mean normalized by reference posterior standard deviation") may not be the most informative. Indeed, significant differences between MNLEs and LANs in this metric may have no practical consequence whatsoever in terms of absolute parameter estimation errors. This is what Figure 4A seems to indicate. It could be useful to use metrics that relate to parameter recovery, e.g., parameter estimation error or pairwise parameter confusion. LAN estimators may show more pairwise parameter confusion than MNLE estimators, for example.</p></disp-quote><p>We thank the reviewers for pointing this out and agree that it is important to check how errors in likelihood approximation accuracy impact inference accuracy in practice, and to report parameter estimation accuracy with a separate metric. To evaluate both posterior accuracy and parameter estimation accuracy we now additionally report the mean squared error (MSE) between the posterior sample mean and the underlying ground-truth parameters. We updated figure 3 as follows: in panel B and C we show relative errors in posterior mean and posterior variance (as before), and in panel D we now show the parameter estimation accuracy for all methods, as well as for the reference posterior as a reference for the best possible accuracy (the panel showed the dispersion metric before). Note that we average parameter estimation metric over the four parameter dimensions to keep the figure compact, but that this did not affect results qualitatively (we added a supplementary figure showing metrics for each parameter dimension separately, Figure 3—figure supplement 1, see end of document). Panel E shows C2ST scores as before. All panels were updated with the corresponding results for inference with LAN budgets 10<sup>5</sup> and 10<sup>8</sup> as well.</p><p>We found that the increased errors in likelihood approximation accuracies for smaller LAN budgets we reported above were reflected in the inference accuracy as well (see updated Figure 3):</p><p>To provide concrete examples of how using a LAN trained on a smaller budget affects the inferred posterior we added a supplementary figure (Figure 3—figure supplement 2, see end of document) showing posteriors inferred with the smaller LAN budgets. Additionally, we calculated simulation-based calibration (SBC) results for the two smaller LAN budgets and found that calibration quality decreased with simulation budget and updated Figure 4B, the figure caption and the main text accordingly.</p><p>We thank reviewer 2 for the pointer to the parameter confusion as an additional metric. However, we did not obtain this additional result as we believe that the current results on inference accuracy for the smaller simulation budgets convincingly demonstrate the efficiency of MNLE versus LAN. Overall, this shows that for budgets smaller than 10<sup>11</sup> simulations, inference with LANs was not nearly as accurate as with MNLE, whereas MNLE was close to the reference performance on parameter estimation accuracy. We adapted the figure captions and results reported in the main text accordingly.</p><disp-quote content-type="editor-comment"><p>3. The following limitations of MNLE estimators (limitations shared with LAN estimators) should be discussed more explicitly in the manuscript:</p><p>– The drift-diffusion model used to benchmark the approach has a very low-dimensional observation structure (one binary observation + one continuous observation per trial). This limitation is not necessarily a problem because many models in cognitive neuroscience have a very low-dimensional observation structure, but it is worth mentioning.</p><p>– The approach described works for i.i.d. data. Any additional structure/dependence (e.g., adding parameters to characterize the stimulus shown in the trial) effectively increases the dimensionality of the likelihood approximation the network needs to learn.</p><p>– The manuscript does not sufficiently discuss nor explore the scalability of the MNLE approach. Given the previous related work by some of the authors, that has shown remarkable results in this respect, it would be useful to discuss the scalability of MNLEs in the discussion.</p><p>– Like any neural-network based approach, selecting the hyperparameters and architecture of the network requires either prior knowledge and/or brute force. And it is currently unclear how in practice the MNLE approach can be applied for different models. Importantly, even if training a MNLE estimator *given the hyperparameters* might require a small number of simulations (10^5), we need to account for the additional cost of finding the correct hyperparameters for training the emulator (presumably, requiring an exploration of at least 10-100 hyperparameter settings).</p></disp-quote><p>We agree that these are four important points and changed the third paragraph in the discussion accordingly (see below).</p><p>Regarding the additional model or stimulus parameter, we agree that those will effectively increase the dimensionality of the learning problem, but we note that under the i.i.d assumption, additional parameter dependencies, e.g,. for hierarchical inference scenarios, will not require retraining MNLEs but will only affect the inference step with MCMC or VI.</p><p>Regarding the selection of neural network settings we agree that this was not pointed out clearly enough in the manuscript. MNLE is now fully integrated into the <italic>sbi</italic> toolbox and was set up with default settings that we expect to work well in many applications (see e.g., the sbi benchmark by Lueckmann et al., 2021, who showed that the same settings worked well for NLE across a wide range of tasks). Additionally, with the evaluation techniques available in the <italic>sbi</italic> toolbox, e.g., posterior predictive checks and simulation-based calibration, it is now possible for users to evaluate their trained MNLE emulator. Nevertheless, it is important to emphasize that it might be necessary to adapt the MNLE architecture to a given application and to repeat training multiple times to try different settings.</p><p>To address all four points, we adapted the discussion as follows (extending the third paragraph in the discussion):</p><p>“[…]. In contrast, MNLE can be applied to any simulation-based model with intractable likelihoods and mixed data type-outputs. Here, we focused on the direct comparison to LANs based on variants of the DDM. We note that these models have a rather low-dimensional observation structure (as common in many cognitive neuroscience models), and that our examples did not include additional parameter structure, e.g., stimulus encoding parameters, which would increase the dimensionality of the learning problem. However, other variants of neural density estimation have been applied successfully to a variety of problems with higher dimensionality (see e.g., Gonçalves et al., 2020; Lueckmann et al., 2021; Glöckler et al., 2021; Dax et al., 2022). Therefore, we expect MNLE to be applicable to other simulation-based problems with higher-dimensional observation structure and parameter spaces, and to scale more favorably than LANs.</p><p>Like for any neural network-based approach, applying MNLE to different inference problems may require selecting different architecture and training hyperparameters settings, which may induce additional computational training costs. To help with this process, we adopted default settings which have been shown to work well on a large range of SBI benchmarking problems (Lueckmann et al., 2021), and we integrated MNLE into the established sbi python package that provides well-documented implementations for training- and inference performance of SBI algorithms.”</p><disp-quote content-type="editor-comment"><p>Reviewer #2 (Recommendations for the authors):</p><p>Let me now explain why I think that authors' main claim (namely: that MNLE is more efficient than LAN) may not be strongly supported by the results reported in the current version of the manuscript.</p><p>The main piece of evidence is summarized in Section 2.2, where authors claim that MNLE learns likelihood function as accurately as LAN, but with only a fraction of the simulation budget. In brief, authors showed that a MNLE estimator trained with 10^5 model simulations is as accurate as a pre-trained version of the LAN method for DDMs (which used 10^11 model simulations). The problem here, is that they did not show that a LAN estimator trained with only 10^5 model simulations yields less accurate results. Or that MNLE trained with 10^11 simulations yields much better results. More generally, what is needed here is a quantitative comparison of the efficiency of the method. One would expect that, for both MNLE and LAN, increasing the number of model simulations increases the accuracy of likelihood approximation. However, this probably follows the law of diminishing marginal utility (i.e. using 10^11 simulations may bring little advantage when compared to e.g. 10^10) (:) Hence, efficiency here should be measured in terms of the speed at which the results accuracy increases. In other terms, claiming that MNLE is more efficient than LAN is important, but requires more work than what is done here. Authors should systematically vary the number of model simulations (spanning, e.g. the 10^5 to 10^11 range), and quantify each method's efficiency in terms of the accuracy profile over this range.</p><p>Now, this analysis should be performed with different "accuracy" metrics. In particular, authors may measure likelihood approximation accuracy, parameter estimation accuracy and estimation uncertainty accuracy. The latter are critical, because if errors in likelihood approximations may have very small impact on parameter estimation. I note that, w.r.t. parameter estimation accuracy and estimation uncertainty accuracy, I don't think authors chose very informative metrics. For example, the former is defined as the "absolute difference in posterior sample mean normalized by reference posterior standard deviation". If I understood correctly, this metric may show significant differences between MNLE and LAN that would have no practical consequence whatsoever, given that methods would be very similar in terms of their absolute parameter estimation errors. In fact, this is what Figure 4A seems to indicate. I would rather suggest to use accuracy metrics that practically "mean something" for parameter recovery. Measuring parameter estimation error is a possibility (although one may then conclude that MNLE brings no clear advantage when compared to LAN, cf. Figure 4A). But there are other possibilities. For example, likelihood approximation errors may induce pairwise parameter confusions. In turn, for a small simulation budget, LAN estimators may show more pairwise parameter confusion than MNLE. Note: to quantify pairwise parameter confusion in the context of DDM, authors may take inspiration from Feltgen 2021 (or any other metric of their choice).</p></disp-quote><p>Thank you for the detailed review and explanation of your concerns with the paper. As mentioned above in the point-by-point response to the essential revision, we were now able to train LANs ourselves and run experiments with smaller simulation budgets. We trained LANs with budgets of 10<sup>{4, 5, 6, 7, and 8}</sup> and observed, as expected, an overall decrease in likelihood approximation accuracy with decreasing budget. As addressed in detail above and now reported in the updated version of Figure 2, we found that the likelihood approximation accuracy of LAN matched that of MNLE (trained with 10<sup>5</sup>) simulations only for a budget of 10<sup>11</sup> simulations. The performance with the smaller budgets decreased monotonically with decreasing budgets and resulted in significantly worse performance in likelihood approximations as well as in posterior accuracy, posterior calibration and parameter point estimation accuracy. We clarify the algorithmic differences between LAN and MNLE and the reason for the efficiency difference in the main text (Methods and Materials—Relation to LAN).</p><p>We also thank you for the suggestions regarding performance metrics. We updated the posterior accuracy metrics to be in absolute terms and added the parameter estimation accuracy metric as described above. As our additional results confirmed our expectations with respect to the LAN performance on smaller budgets, we did not add a metric on parameter confusion.</p><p>Overall, we hope that our additional experiments alleviate your concerns regarding the efficiency of MNLE compared to LAN.</p></body></sub-article></article>