---
authors:
  - givenNames:
      - Hugh
    familyNames:
      - Pastoll
    type: Person
    affiliations:
      - name: 'Centre for Discovery Brain Sciences, University of Edinburgh'
        address:
          addressCountry: United Kingdom
          addressLocality: Edinburgh
          type: PostalAddress
        type: Organization
  - givenNames:
      - Derek
      - L
    familyNames:
      - Garden
    type: Person
    affiliations:
      - name: 'Centre for Discovery Brain Sciences, University of Edinburgh'
        address:
          addressCountry: United Kingdom
          addressLocality: Edinburgh
          type: PostalAddress
        type: Organization
  - givenNames:
      - Ioannis
    familyNames:
      - Papastathopoulos
    type: Person
    affiliations:
      - name: The Alan Turing Institute
        address:
          addressCountry: United States
          addressLocality: London
          type: PostalAddress
        type: Organization
      - name: >-
          School of Mathematics, Maxwell Institute and Centre for Statistics,
          University of Edinburgh
        address:
          addressCountry: United Kingdom
          addressLocality: Edinburgh
          type: PostalAddress
        type: Organization
  - givenNames:
      - Gülşen
    familyNames:
      - Sürmeli
    type: Person
    affiliations:
      - name: 'Centre for Discovery Brain Sciences, University of Edinburgh'
        address:
          addressCountry: United Kingdom
          addressLocality: Edinburgh
          type: PostalAddress
        type: Organization
  - givenNames:
      - Matthew
      - F
    familyNames:
      - Nolan
    type: Person
    emails:
      - mattnolan@ed.ac.uk
    affiliations:
      - name: 'Centre for Discovery Brain Sciences, University of Edinburgh'
        address:
          addressCountry: United Kingdom
          addressLocality: Edinburgh
          type: PostalAddress
        type: Organization
editors:
  - givenNames:
      - Lisa
    familyNames:
      - Giocomo
    type: Person
    affiliations:
      - name: Stanford School of Medicine
        address:
          addressCountry: United States
          type: PostalAddress
        type: Organization
datePublished:
  value: '2020-02-13'
  type: Date
dateReceived:
  value: '2019-09-26'
  type: Date
dateAccepted:
  value: '2020-02-04'
  type: Date
title: >-
  Inter- and intra-animal variation in the integrative properties of stellate
  cells in the medial entorhinal cortex
description: >-
  Distinctions between cell types underpin organizational principles for nervous
  system function. Functional variation also exists between neurons of the same
  type. This is exemplified by correspondence between grid cell spatial scales
  and the synaptic integrative properties of stellate cells (SCs) in the medial
  entorhinal cortex. However, we know little about how functional variability is
  structured either within or between individuals. Using ex-vivo patch-clamp
  recordings from up to 55 SCs per mouse, we found that integrative properties
  vary between mice and, in contrast to the modularity of grid cell spatial
  scales, have a continuous dorsoventral organization. Our results constrain
  mechanisms for modular grid firing and provide evidence for inter-animal
  phenotypic variability among neurons of the same type. We suggest that neuron
  type properties are tuned to circuit-level set points that vary within and
  between animals.
isPartOf:
  volumeNumber: 9
  isPartOf:
    title: eLife
    issns:
      - 2050-084X
    identifiers:
      - name: nlm-ta
        propertyID: 'https://registry.identifiers.org/registry/nlm-ta'
        value: elife
        type: PropertyValue
      - name: publisher-id
        propertyID: 'https://registry.identifiers.org/registry/publisher-id'
        value: eLife
        type: PropertyValue
    publisher:
      name: 'eLife Sciences Publications, Ltd'
      type: Organization
    type: Periodical
  type: PublicationVolume
licenses:
  - url: 'http://creativecommons.org/licenses/by/4.0/'
    content:
      - content:
          - 'This article is distributed under the terms of the '
          - content:
              - Creative Commons Attribution License
            target: 'http://creativecommons.org/licenses/by/4.0/'
            type: Link
          - >-
            , which permits unrestricted use and redistribution provided that
            the original author and source are credited.
        type: Paragraph
    type: CreativeWork
keywords:
  - entorhinal cortex
  - synaptic integration
  - presynaptic function
  - multi-vesicular release
  - synaptic vesicle
  - Mouse
identifiers:
  - name: publisher-id
    propertyID: 'https://registry.identifiers.org/registry/publisher-id'
    value: 52258
    type: PropertyValue
  - name: doi
    propertyID: 'https://registry.identifiers.org/registry/doi'
    value: 10.7554/eLife.52258
    type: PropertyValue
  - name: elocation-id
    propertyID: 'https://registry.identifiers.org/registry/elocation-id'
    value: e52258
    type: PropertyValue
fundedBy:
  - identifiers:
      - value: 200855/Z/16/Z
        type: PropertyValue
    funders:
      - name: Biotechnology and Biological Sciences Research Council (BBSRC)
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: BB/1022147/1
        type: PropertyValue
    funders:
      - name: Biotechnology and Biological Sciences Research Council (BBSRC)
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: BB/H020284/1
        type: PropertyValue
    funders:
      - name: Biotechnology and Biological Sciences Research Council
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: 200855/Z/16/Z
        type: PropertyValue
    funders:
      - name: Wellcome
        type: Organization
    type: MonetaryGrant
about:
  - name: Neuroscience
    type: DefinedTerm
genre:
  - Research Article
bibliography: elife-52258.references.bib
---

# Introduction

The concept of cell types provides a general organizing principle for understanding biological structures including the brain (@bib65; @bib84). The simplest conceptualization of a neuronal cell type, as a population of phenotypically similar neurons with features that cluster around a single set point (@bib79), is extended by observations of variability in cell type features, suggesting that some neuronal cell types may be conceived as clustering along a line rather than around a point in a feature space (@bib19; @bib55; [Figure 1A](#fig1)). Correlations between the functional organization of sensory, motor and cognitive circuits and the electrophysiological properties of individual neuronal cell types suggest that this feature variability underlies key neural computations (@bib1; @bib3; @bib24; @bib28; @bib30; @bib46; @bib55). However, within-cell type variability has typically been deduced by combining data obtained from multiple animals. By contrast, the structure of variation within individual animals or between different animals has received little attention. For example, apparent clustering of properties along lines in feature space could reflect a continuum of set points, or could result from a small number of discrete set points that are obscured by inter-animal variation ([Figure 1B](#fig1)). Moreover, although investigations of invertebrate nervous systems show that set points may differ between animals (@bib36), it is not clear whether mammalian neurons exhibit similar phenotypic diversity ([Figure 1B](#fig1)). Distinguishing these possibilities requires many more electrophysiological observations for each animal than are obtained in typical studies.

```{r General setup, warning=FALSE, message=FALSE}
print("Load data and general setup")

# The packages broomExtra, corpcor must also be installed.
library(tidyverse)
library(cowplot)
library(lme4)
library(GGally)
library(MuMIn)
source("Functions.R")

# Use standard cowplot theme throughout
theme_set(theme_cowplot())

# Load stellate cell data
fname.sc <- "Data/datatable_sc.txt"
data.import.sc <- read_tsv(fname.sc)

# Strip out rows from data where locations are unknown (are NaN)
data.sc <- data.import.sc %>% drop_na(dvloc)

# Convert dvloc from microns to millimetres - prevents errors in model fitting large dv values
data.sc <- mutate(data.sc, dvlocmm = dvloc/1000)

# Add the number of observations for each mouse as a column
counts <- count(data.sc, id)
data.sc$counts <- counts$n[match(data.sc$id, counts$id)]

# Make sure id, hemi, housing, mlpos, expr, patchdir are factors
col_facs <- c("id", "hemi", "housing", "mlpos", "expr", "patchdir")
data.sc[col_facs] <- lapply(data.sc[col_facs], factor)

# Load data from Wfs1+ve cells
fname.wfs <- "Data/datatable_cal.txt"
data.import.wfs <- read_tsv(fname.wfs)
data.wfs <- data.import.wfs %>% drop_na(dvloc)
data.wfs <- mutate(data.wfs, dvlocmm = dvloc/1000)

# Convert SC data to nested tidy format. This is required for use of purrr::map, etc.
data.sc_r <- data.sc %>%
  dplyr::select(vm:fi, dvlocmm, id, housing, id, mlpos, hemi, age, housing, expr, patchdir, rectime, counts) %>%
  gather("property", "value", vm:fi) %>%
  group_by(property) %>%
  nest %>%
  ungroup

# Convert Wfs1+ve cell data to tidy format.
data.wfs_r <- filter(data.wfs, classification == "Pyr") %>%
  select(-classification) %>%
  gather("property", "value", vm:fi) %>%
  group_by(property) %>%
  nest()

# This is required for Figure 4D, E, 7B and Table 1.
# Code is here as it only needs to be run once.

# Model for uncorrelated random intercept and slope
model_vsris <- function(df) {
  lme4::lmer(value ~ dvlocmm +(dvlocmm||id), data = df, REML = FALSE, na.action = na.exclude)
}

# Null model for uncorrelated random intercept and slope
model_vsris_null <- function(df) {
  lme4::lmer(value ~ dvlocmm||id, data = df, REML = FALSE, na.action = na.exclude)
}

# Fit models to the data
data.sc_r <- data.sc_r %>%
  mutate(mm_vsris = map(data, model_vsris))%>%
  mutate(mm_vsris_null = map(data, model_vsris_null))

# General clustering parameters.
k.max <- 8 # Maximum number of clusters

# d.power is the  power to raise Euclidian distances to. Default is 1, Tibshirani used 2.
# NB: changing this requires the threshold fit initial parameter guess values to be changed according to the comments in the 'fit_thresholds' function.
d.power <- 2

iter.max <-
  100 # Maximum iterations the kmeans algorithm will run. Default is 10
nstart <-
  50 # The number of times the kmeans algorithm initialises. Total stability needs > 200

# Threshold parameters
threshold_criterion = 0.01 # Specify false positive rate for each evaluated k

# Parameters for uniform sampled simulated datasets. For calculating thresholds and calculated
# reference dispersions.
sim_unif_dataset_sizes <-
  seq(from = 20, to = 100, by = 10) # A wide range is useful for stable fits
sim_unif_n_sims <- 20000 # >10000 provides sufficienty good fits

# Parameters for example multimodal simulation illustration of clustering with different k
mm_sims <- list()
mm_sims$n_ex_modes <- 5 # Number of clusters
mm_sims$ex_mode_sep_sd <-
  c(3.5, 4, 4.5, 5, 5.5, 6) # Separations between cluster centers

# Parameters for evaluating cluster detection sensitivity
mm_sims$n <-
  5000 # Simulated datasets per combination of factors. 5000 provides stable results.
mm_sims$sep_sd_max <-
  6 # Range of separations of standard deviations to evaluate
mm_sims$sep_sd_min <- 3
mm_sims$sep_sd_incr <- 0.5

mm_sims$n_data_max <-  70 # Range of multimodal dataset sizes to evaluate.
mm_sims$n_data_min <- 20
mm_sims$n_data_incr <- 10

mm_sims$k_max <- 8 # Range of numbers of clusters (k) to evaluate
mm_sims$k_min <- 2
mm_sims$k_incr <- 1

# Other variables (help with import, categorisation and plotting)
mouse_ind <-  8 # Mouse to visualise clustering for. Takes integer values.

props_subthresh <- c("vm", "ir", "sag", "tau", "resf", "resmag")
props_suprathresh <-
  c("spkthr", "spkmax", "spkhlf", "rheo", "ahp", "fi")
property_names <- c(props_subthresh, props_suprathresh)
props_sub_labels <-
  c(
    "vm" = "Vm",
    "ir" = "IR",
    "sag" = "Sag",
    "tau" = "Mem. tau",
    "resf" = "Res. freq.",
    "resmag" = "Res. mag."
  )
props_supra_labels <-
  c(
    "spkthr" = "Spk. thresh.",
    "spkmax" = "Spk. max.",
    "spkhlf" = "Spk. halfwidth",
    "rheo" = "Rheobase",
    "ahp" = "AHP",
    "fi" = "FI"
  )
property_labels <- c(props_sub_labels, props_supra_labels)


# Generate simulated delta gap data. If sim_unif_n_sims is high this will take a long time
# to run. Will look for a loaded version of slopes_store, then for a saved version.
# If it finds neither then will build slopes_store (and dispersion_store).
if (exists("delta_gaps_store")) {
  print("delta_gaps_store already generated, remove it if you want to rebuild or reload it")
  if (NROW(delta_gaps_store) != sim_unif_n_sims) {
    warning("Number of simulated uniform datasets differs from parameter value")
  }
} else {
  if (file.exists("GapData/unif_sims_delta_gaps_and_dispersions.Rda")) {
    load("GapData/unif_sims_delta_gaps_and_dispersions.Rda")
    print("Loaded simulated delta gaps and dispersions")
    if (NROW(delta_gaps_store) != sim_unif_n_sims) {
      warning("Number of simulated uniform datasets differs from parameter value")
    }
  } else {
    # Create unif_sims_delta_gaps_and_dispersions.Rda
    gen_unif_sims(sim_unif_n_sims,
                  sim_unif_dataset_sizes,
                  k.max = k.max,
                  d.power = d.power)
    # Load 'delta_gaps_store' and 'dispersion_store' variables
    load("GapData/unif_sims_delta_gaps_and_dispersions.Rda")
  }
}

# Fitting the delta gap thresholds and dispersions
# It's important to check after each regeneration of simulated data that the threshold fits are appropriate. With less than 10000 runs, the thresholds can be quite noisy and it's possible that the fits won't converge. The form of the fitted equations is $y = (a/n^{b}) + c$, where $n$ is the number of data points in the evaluated dataset and $a, b, c$ are the parameters to be fitted.
# 
# In addition, to speed up evaluation of the reference dispersion to calculate the modified gap statistic k_est, it's also possible to fit the dispersions obtained from the simulated data and reuse this information so that multiple bootstrap samples don't have to be drawn for every evaluated dataset. Dispersion fits use a different fitting equation: $y = a*log(n) + b$, where $a, b$ are the parameters to be fitted. 

# Get the delta gap thresholds (and the delta gaps themselves for plotting)
delta_gaps_thresholds <-
  get_delta_gaps_thresh_vals(delta_gaps_store,
                             threshold_criterion =
                               threshold_criterion)
# Fit delta_slope_thresh_vals
threshold_fit_results <-
  fit_thresholds(wd, delta_gaps_thresholds, sim_unif_dataset_sizes)

# Find the average dispersion for each k across all simulations (prevents having to re-simulate
# several uniform distributions every time the modified gap statistic is evaluated)
mean_dispersions <- get_mean_dispersions(dispersion_store)
# Fit mean_dispersions with log function
dispersion_fit_results <-
  fit_dispersions(wd, mean_dispersions, sim_unif_dataset_sizes)

```

chunk: Figure 1A
:::
## Classification and variability of neuronal cell types

(**A**) Neuronal cell types are identifiable by features clustering around a distinct point (blue, green and yellow) or a line (orange) in feature space. The clustering implies that neuron types are defined by either a single set point (blue, green and yellow) or by multiple set points spread along a line (orange).

```{r}
#' @width 9
#' @height 9

# Make cell distributions for each 'cell type'
numcells <- 100
Cell_A <- tibble(x = rnorm(numcells, 10, 1),
                       y = rnorm(numcells, 12, 1),
                       cell = "A")
Cell_B <- tibble(x = runif(numcells, min = 20, max = 40) + rnorm(numcells,0,1),
                       y = rnorm(numcells, 25, 2),
                       cell = "B")
Cell_C <- tibble(x = rnorm(numcells, 10, 2),
                       y = rnorm(numcells, 35, 2),
                       cell = "C")
Cell_D <- tibble(x = rnorm(numcells, 30, 2),
                       y = rnorm(numcells, 10, 2),
                       cell = "D")
 
CellFeatures <- bind_rows(Cell_A, Cell_B, Cell_C, Cell_D)

# Plot 'cell features' using a colour blind friendly palette (from http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/).
cbPalette <- c("#E69F00", "#D55E00", "#56B4E9", "#009E73")

CF_plot <- ggplot(CellFeatures, aes(x, y, colour = cell)) +
  geom_point() +
  xlim(0,45) +
  ylim(0,45) +
  labs(x = "Feature 1", y = "Feature 2", colour = "Cell Type", title = "Cell type separation") +
  scale_colour_manual(values=cbPalette) +
  theme_classic() +
  theme(legend.position = "bottom",
        axis.ticks = element_blank(),
        axis.text = element_blank(),
        legend.text = element_blank())
CF_plot
```
:::
{#fig1a}


chunk: Figure 1B
:::
## Classification and variability of neuronal cell types

(**B**) Phenotypic variability of a single neuron type could result from distinct set points that subdivide the neuron type but appear continuous when data from multiple animals are combined (modular), from differences in components of a set point that do not extend along a continuum but that in different animals cluster at different locations in feature space (orthogonal), or from differences between animals in the range covered by a continuum of set points (offset). These distinct forms of variability can only be made apparent by measuring the features of many neurons from multiple animals.

```{r}
# Example 'modular' data
numcells <- 20
MF_A <- tibble(x = c(rnorm(numcells, 26, 0.5), rnorm(numcells, 31, 0.5), rnorm(numcells, 36, 0.5)),
                   y = rnorm(numcells*3, 25, 2),
                   animal = "1")
MF_B <- tibble(x = c(rnorm(numcells, 28.5, 0.5), rnorm(numcells, 33.5, 0.5), rnorm(numcells, 38.5, 0.5)),
                    y = rnorm(numcells*3, 25, 2),
                    animal = "2")
                   
ModularFeatures <- rbind(MF_A, MF_B)

# Example 'orthogonal' data
numcells <- 60

OOF_A <- tibble(x = runif(numcells, min = 25, max = 40) + rnorm(numcells,0,1),
                y = rnorm(numcells, 23, 1),
                animal = "1")
OOF_B <- tibble(x = runif(numcells, min = 25, max = 40) + rnorm(numcells,0,1),
                y = rnorm(numcells, 27, 1),
                animal = "2")
     
OrthogOffsetFeatures <- rbind(OOF_A, OOF_B)

# Example 'offset' data
numcells <- 60

LOF_A <- tibble(x = runif(numcells, min = 25, max = 35) + rnorm(numcells,0,1),
                y = rnorm(numcells, 25, 2),
                animal = "1")
LOF_B <- tibble(x = runif(numcells, min = 30, max = 40) + rnorm(numcells,0,1),
                y = rnorm(numcells, 25, 2),
                animal = "2")
LinearOffsetFeatures <- rbind(LOF_A, LOF_B)

# Combine into a plot
ModularFeatures$scheme <- "modular"
OrthogOffsetFeatures$scheme <- "orthog"
LinearOffsetFeatures$scheme <- "offset"
IntraAnimal <- bind_rows(ModularFeatures, OrthogOffsetFeatures, LinearOffsetFeatures)
IntraAnimal$scheme <- as.factor(IntraAnimal$scheme)
IntraAnimal$scheme = factor(IntraAnimal$scheme, c("modular", "orthog","offset"))
labels_schemes <- c(modular = "Modular", orthog = "Orthogonal", offset = "Offset")

IntraAnimalPlot <- ggplot(IntraAnimal, aes(x, y, alpha = animal)) +
  geom_point(colour = cbPalette[2]) +
  xlim(20,45) +
  ylim(20,30) +
  labs(x = "Feature 1", y = "Feature 2", alpha = "Animal", title = "Within cell type variability") +
  facet_wrap(~scheme, nrow = 3, labeller = labeller(scheme = labels_schemes)) +
  theme_classic() +
  theme(strip.background = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank()) +
  scale_alpha_discrete(range=c(0.1,1)) +
  theme(legend.position = "bottom",
        legend.text = element_blank())

print(IntraAnimalPlot)
```
:::
{#fig1b}


figure: Figure 1—figure supplement 1
:::
![](elife-52258.rmd.media/fig1-figsupp1.jpg)

## A quantitative adaptation of the gap statistic clustering algorithm.

(**A–C**) Logic of the gap statistic. (**A**) Simulated clustered dataset with three modes (k = 3) (open gray circles) (upper) and the corresponding simulated reference dataset drawn from a uniform distribution with lower and upper limits set by the minimum and maximum values from the corresponding clustered dataset (open gray diamonds). Data were allocated to clusters by running a K-means algorithm 20 times on each set of data and selecting the partition with the lowest average intracluster dispersion. Red, green, blue and yellow dashed ovals show a realization of the data subsets allocated to each cluster when k~Eval~ = 1, 2, 3 and 4 modes. (**B**) The average value of the log intracluster dispersion for the clustered (open circles) and reference (open diamonds) datasets for each value of k tested in panel (**A**). (**C**) Gap values resulting from the difference between the clustered and reference values for each k in panel (**B**). Many (≥500 in this paper) reference distributions are generated and their mean intracluster dispersion values are subtracted from those arising from the clustered distribution to maximize the reliability of the Gap values. (**D**) A procedure for selecting the optimal k depending on the associated gap values. Quantitative procedure for selecting optimal k (k~est~). ∆Gap values (open circles) are obtained by subtracting from the Gap value of a given k the Gap value for the previous k (∆Gap~k~ = Gap~k~ – Gap~k-1~). For each ∆Gap~k~, if the ∆Gap value is greater than a threshold (filled triangles) associated with that ∆Gap~k~, that ∆Gap~k~ will be k~est~, if no ∆Gap exceeds, the threshold, k~est~ = 1. In the figure, for ∆Gap~k~ = 2, 3, 4 (brown, pink and cyan), the ∆Gap value exceeds its threshold only when ∆Gap~k~ = 3. Therefore k~est~ = 3. (**E–G**) Determination of ∆Gap~k~ thresholds. (**E**) Histograms of ∆Gap values calculated from 20,000 simulated datasets drawn from uniform distributions for each ∆Gap~k~ (brown, pink and cyan, respectively, for ∆Gap~k~ = 2, 3, 4) for a single dataset size (n = 40). ∆Gap thresholds (filled triangles) are the values beyond which 1% of the ∆Gap values fall and vary with ∆Gap~k~. (**F**) Histograms of ∆Gap values for a range of dataset sizes (n = 20, 40, 100) and their associated thresholds. (**G**) Plot of the ∆Gap thresholds as a function of dataset size and ∆Gap~k~. For separate ∆Gap~k~, ∆Gap threshold values are fitted well by a hyperbolic function of dataset size. These fits provide a practical method of finding the appropriate ∆Gap threshold for an arbitrary dataset size.
:::


figure: Figure 1—figure supplement 2
:::
![](elife-52258.rmd.media/fig1-figsupp2.jpg)

## Discrimination of continuous from modular organizations using the adapted gap statistic algorithm.

(**A**) Simulated datasets (each n = 40) drawn from continuous (uniform, k = 1 mode) (upper) and modular (multimodal mixture of Gaussians with k = 2 modes) (lower) distributions, and plotted against simulated dorsoventral locations. Also shown are the probability density functions (pdf) used to generate each dataset (light blue) and the densities estimated post-hoc from the generated data as kernel smoothed densities (light gray pdfs). (**B**) Histograms showing the distribution of k~est~ from 1000 simulated datasets drawn from each pdf in panel (**A**). k~est~ is determined for each dataset by a modified gap statistic algorithm (see [Figure 1—figure supplement 1](#fig1s1) above). When k~est~ = 1, the dataset is considered to be continuous (unclustered), when k~est~ ≥2, the dataset is considered to be modular (clustered). The algorithm operates only on the feature values and does not use location information. (**C**) Illustration of a set of clustered (k = 2) pdfs with the distance (in standard deviations) between clusters ranging from 2 to 6 (upper). Systematic evaluation of the ability of the modified gap statistic algorithm to detect clustered organization (k~est~ ≥2) in simulated datasets of different size (n = 20 to 100) drawn from the clustered (filled blue) and continuous (open blue) pdfs (lower). The proportion of datasets drawn from the continuous distribution that have k~est~ ≥2 is the false positive (FP) rate (pFP = ~0.07). The light gray filled circle shows the smallest dataset size (n = 40) with SD = 5, where the proportion of datasets detected as clustered (p~detect~) is ~0.8. (**D**). Plot showing how p~detect~ at n = 40, SD = 5 changes when datasets are drawn from pdfs with different numbers of clusters (n modes from 2 to 8). Further evaluation of the analysis of additional clusters is represented in the following figure.
:::


figure: Figure 1—figure supplement 3
:::
![](elife-52258.rmd.media/fig1-figsupp3.jpg)

## Additional evaluation of the adapted gap statistic algorithm.

(**A–C**) Plots showing how p~detect~ (the ability of the modified gap statistic algorithm to detect clustered organization) depends on dataset size and separation between cluster modes in simulated datasets drawn from clustered pdfs with different numbers of modes. The gray markers indicate n = 40, SD = 5 (as shown in [Figure 1E](#fig1)). In each plot, p~detect~ is shown as a function of simulated dataset size and separation between modes when k = 3 (**A**), k = 5 (**B**) and k = 8 (**C**), which was the maximum number of clusters evaluated. (**D–F**) Histograms showing the counts of k~est~ from the 1000 simulated n = 40, SD = 5 datasets (gray filled circles) illustrated in panels (**A–C**), respectively. (**G**) p~detect~ as a function of dataset size and mode separation with k = 5 when cluster modes are unevenly sampled. Sample sizes from clusters vary randomly with each dataset. A single mode can contribute from all to none of the points in any simulated dataset. (**H**) p~detect~ as a function of dataset size and mode separation with k = 5 when the distance between mode centers increases by a factor of sqrt(2) between sequential cluster pairs. Data are shown for different initial separations (the distance between the first two cluster centers) ranging from 1 to 4 (with corresponding separations between the final cluster pair ranging from 4 to 16).
:::


figure: Figure 1—figure supplement 4
:::
![](elife-52258.rmd.media/fig1-figsupp4.jpg)

## Comparing the adapted gap statistic algorithm with discontinuity measures for discreteness.

(**A**) Counts of log discontinuity ratio scores generated from a simulated uniform data distribution. The data distribution was ordered and then sampled either at positions drawn at random from a uniform distribution (dark blue) or at positions with a fixed increment (light blue). For the data sampled at random positions, approximately half of the scores are >0 and for even sampling all scores are >0. Therefore, a threshold score >0 does not distinguish discrete from continuous distributions. (**B**) Comparison of p~detect~ as a function of dataset size for the adapted gap statistic algorithm, the discontinuity (upper) and the discreteness algorithm (lower). Each algorithm is adjusted to yield a 7% false positive rate. Each column shows simulations of data with different numbers of modes (k).
:::


figure: Figure 1—figure supplement 5
:::
![](elife-52258.rmd.media/fig1-figsupp5.jpg)

## Evaluation of modularity of grid firing using an adapted gap statistic algorithm.

Examples of grid spacing for individual neurons (crosses) from different mice. Kernel smoothed densities (KSDs) were generated with either a wide (solid gray) or a narrow (dashed lines) kernel. The number of modes estimated using the modified gap statistic algorithm is ≥ 2 for all but one animal (animal 4) with n ≥ 20 (animals 3 and 7 have &lt; 20 recorded cells). We did not have location information for animal 2.
:::


Stellate cells in layer 2 (SCs) of the medial entorhinal cortex (MEC) provide a striking example of correspondence between functional organization of neural circuits and variability of electrophysiological features within a single cell type. The MEC contains neurons that encode an animal’s location through grid-like firing fields (@bib27). The spatial scale of grid fields follows a dorsoventral organization (@bib42), which is mirrored by a dorsoventral organization in key electrophysiological features of SCs (@bib10; @bib21; @bib28; @bib30; @bib33; @bib58). Grid cells are further organized into discrete modules (@bib71), with the cells within a module having a similar grid scale and orientation (@bib6; @bib40; @bib71; @bib82); progressively more ventral modules are composed of cells with wider grid spacing (@bib71). Studies that demonstrate dorsoventral organization of integrative properties of SCs have so far relied on the pooling of relatively few measurements per animal. Hence, it is unclear whether the organization of these cellular properties is modular, as one might expect if they directly set the scale of grid firing fields in individual grid cells (@bib30). The possibility that set points for electrophysiological properties of SCs differ between animals has also not been considered previously.

Evaluation of variability between and within animals requires statistical approaches that are not typically used in single-cell electrophysiological investigations. Given appropriate assumptions, inter-animal differences can be assessed using mixed effect models that are well established in other fields (@bib4; @bib29). Because tests of whether data arise from modular as opposed to continuous distributions have received less general attention, to facilitate detection of modularity using relatively few observations, we introduce a modification of the gap statistic algorithm (@bib75) that estimates the number of modes in a dataset while controlling for observations expected by chance (see 'Materials and methods' and [Figure 1—figure supplements 1](#fig1s1)–[5](#fig1s5)). This algorithm performs well compared with discreteness metrics that are based on the standard deviation of binned data (@bib32; @bib71), which we find are prone to high false-positive rates ([Figure 1—figure supplement 4A](#fig1s4)). We find that recordings from approximately 30 SCs per animal should be sufficient to detect modularity using the modified gap statistic algorithm and given the experimentally observed separation between grid modules (see 'Materials and methods' and [Figure 1—figure supplements 2](#fig1s2)–[3](#fig1s3)). Although methods for high-quality recording from SCs in ex-vivo brain slices are well established (@bib59), typically fewer than five recordings per animal were made in previous studies, which is many fewer than our estimate of the minimum number of observations required to test for modularity.

We set out to establish the nature of the set points that establish the integrative properties of SCs by measuring intra- and inter-animal variation in key electrophysiological features using experiments that maximize the number of SCs recorded per animal. Our results suggest that set points for individual features of a neuronal cell type are established at the level of neuronal cell populations, differ between animals and follow a continuous organization.

# Results

## Sampling integrative properties from many neurons per animal

Before addressing intra- and inter-animal variability, we first describe the data set used for the analyses that follow. We established procedures to facilitate the recording of integrative properties of many SCs from a single animal (see 'Materials and methods'). With these procedures, we measured and analyzed electrophysiological features of 836 SCs (n/mouse: range 11–55; median = 35) from 27 mice (median age = 37 days, age range = 18–57 days). The mice were housed either in a standard home cage (dimensions: 0.2 × 0.37 m, N = 18 mice, n = 583 neurons) or from postnatal day 16 in a 2.4 × 1.2 m cage, which provided a large environment that could be freely explored (N = 9, n = 253, median age = 38 days) ([Figure 2—figure supplement 1](#fig2s1)). For each neuron, we measured six sub-threshold integrative properties ([Figure 2A–B](#fig2)) and six supra-threshold integrative properties ([Figure 2C](#fig2)). Unless indicated otherwise, we report the analysis of datasets that combine the groups of mice housed in standard and large home cages and that span the full range of ages.

figure: Figure 2
:::
![](elife-52258.rmd.media/fig2.jpg)

## Dorsoventral organization of intrinsic properties of stellate cells from a single animal.

(**A–C**) Waveforms (gray traces) and example responses (black traces) from a single mouse for step (**A**), ZAP (**B**) and ramp (**C**) stimuli (left). The properties derived from each protocol are shown plotted against recording location (each data point is a black cross) (right). KSDs with arbitrary bandwidth are displayed to the right of each data plot to facilitate visualization of the property’s distribution when location information is disregarded (light gray pdfs). (**A**) Injection of a series of current steps enables the measurement of the resting membrane potential (V~rest~) (**i**), the input resistance (IR) (ii), the sag coefficient (sag) (iii) and the membrane time constant (τ~m~) (iv). (**B**) Injection of ZAP current waveform enables the calculation of an impedance amplitude profile, which was used to estimate the resonance resonant frequency (Res. F) (**i**) and magnitude (Res. mag) (ii). (**C**) Injection of a slow current ramp enabled the measurement of the rheobase (i); the slope of the current-frequency relationship (I-F slope) (ii); using only the first five spikes in each response (enlarged zoom, lower left), the AHP minimum value (AHP~min~) (iii); the spike maximum (Spk. max) (iv); the spike threshold (Spk. thr.) (v); and the spike width at half height (Spk. HW) (vi).
:::
{#fig2}

figure: Figure 2—figure supplement 1
:::
![](elife-52258.rmd.media/fig2-figsupp1.jpg)


## Large environment for housing.

(**A, B**) The large cage environment viewed from above (**A**) and from inside (**B**).
:::

Because SCs are found intermingled with pyramidal cells in layer 2 (L2PCs), and as misclassification of L2PCs as SCs would probably confound investigation of intra-SC variation, we validated our criteria for distinguishing each cell type. To establish characteristic electrophysiological properties of L2PCs, we recorded from neurons in layer 2 that were identified by Cre-dependent marker expression in a _Wfs1_^Cre^ mouse line (@bib72). Expression of Cre in this line, and in a similar line (@bib44), labels L2PCs that project to the CA1 region of the hippocampus, but does not label SCs (@bib44; @bib72). We identified two populations of neurons in layer 2 of MEC that were labelled in _Wfs1_^Cre^ mice ([Figure 3A–C](#fig3)). The more numerous population had properties consistent with L2PCs ([Figure 3A,G](#fig3)) and could be separated from the unidentified population on the basis of a lower rheobase ([Figure 3C](#fig3)). The unidentified population had firing properties that were typical of layer 2 interneurons (@bib37). A principal component analysis (PCA) ([Figure 3D–F](#fig3)) clearly separated the L2PC population from the SC population, but did not identify subpopulations of SCs. The properties of the less numerous population were also clearly distinct from those of SCs ([Figure 3A,C](#fig3)). These data demonstrate that the SC population used for our analyses is distinct from other cell types also found in layer 2 of the MEC.


figure: Figure 3
:::
![](elife-52258.rmd.media/fig3.jpg)

## Distinct and dorsoventrally organized properties of layer 2 stellate cells.

(**A**) Representative action potential after hyperpolarization waveforms from a SC (left), a pyramidal cell (middle) and an unidentified cell (right). The pyramidal and unidentified cells were both positively labelled in _Wfs1^C^_^re^ mice. (**B**) Plot of the first versus the second principal component from PCA of the properties of labelled neurons in _Wfs1_^Cre^ mice reveals two populations of neurons. (**C**) Histogram showing the distribution of rheobase values of cells positively labelled in _Wfs1_^Cre^ mice. The two groups identified in panel (B) can be distinguished by their rheobase. (**D**) Plot of the first two principal components from PCA of the properties of the L2PC (n = 44, green) and SC populations (n = 836, black). Putative pyramidal cells (x) and SCs (+) are colored according to their dorsoventral location (inset shows the scale). (**E**) Proportion of total variance explained by the first five principal components for the analysis in panel (**D**). (**F**) Histograms of the locations of recorded SCs (upper) and L2PCs (lower). (**G**) All values of measured features from all mice are plotted as a function of the dorsoventral location of the recorded cells. Lines indicate fits of a linear model to the complete datasets for SCs (black) and L2PCs (green). Putative pyramidal cells (x, green) and SCs (+, black). Adjusted R^2^ values use the same color scheme.
:::
{#fig3}


chunk: Figure 3G—executable version
:::
## Distinct and dorsoventrally organized properties of layer 2 stellate cells.

(**G**) All values of measured features from all mice are plotted as a function of the dorsoventral location of the recorded cells. Lines indicate fits of a linear model to the complete datasets for SCs (black) and L2PCs (green). Putative pyramidal cells (x, green) and SCs (+, black). Adjusted R^2^ values use the same color scheme.

```{r}
#' @width 30
#' @height 20
plot_sc_wfs <- data.sc %>%
  select(vm:fi, dvlocmm) %>%
  mutate(classification = "SC")
plot_wfs <- data.wfs %>%
  select(vm:fi, dvlocmm, classification)
plot_sc_wfs <- bind_rows(plot_sc_wfs, plot_wfs) %>%
  gather("property", "value", vm:fi)

labels_intercepts <-
  c(
    ahp = "AHP min. (mV)",
    fi = "F-I (Hz / pA)",
    ir = "IR (MΩ)",
    resf = "Res F (Hz)",
    resmag = "Res. mag.",
    rheo = "Rheobase (pA)",
    sag = "Sag",
    spkhlf = "Spike h-w (ms)",
    spkmax = "Spike max. (mV)",
    spkthr = "Spike thres. (mV)",
    tau = "Tm (ms)",
    vm = "Vrest (mV)"
    )

figure_3 <- ggplot(plot_sc_wfs, aes(dvlocmm,value)) +
  geom_point(aes(colour = classification)) +
  scale_x_continuous(name = "Location (mm)",
                     breaks = c(0, 1, 2)) +
  scale_colour_discrete(name = "Neuron type") +
  facet_wrap(~property,
             scales = "free_y",
             labeller = labeller(property = labels_intercepts)) +
  theme_classic() +
  theme(axis.title.y = element_blank(),
        strip.background = element_blank())
print(figure_3)
```
:::


To further validate the large SC dataset, we assessed the location-dependence of individual electrophysiological features, several of which have previously been found to depend on the dorso-ventral location of the recorded neuron (@bib10; @bib11; @bib28; @bib30; @bib58; @bib83). We initially fit the dependence of each feature on dorsoventral position using a standard linear regression model. We found substantial (adjusted R^2^ >0.1) dorsoventral gradients in input resistance, sag, membrane time constant, resonant frequency, rheobase and the current-frequency (I-F) relationship ([Figure 3G](#fig3)). In contrast to the situation in SCs, we did not find evidence for dorsoventral organization of these features in L2PCs ([Figure 3G](#fig3)). Thus, our large dataset replicates the previously observed dependence of integrative properties of SCs on their dorsoventral position, and shows that this location dependence further distinguishes SCs from L2PCs.

## Inter-animal differences in the intrinsic properties of stellate cells

To what extent does variability between the integrative properties of SCs at a given dorsoventral location arise from differences between animals? Comparing specific features between individual animals suggested that their distributions could be almost completely non-overlapping, despite consistent and strong dorsoventral tuning ([Figure 4A](#fig4)). If this apparent inter-animal variability results from the random sampling of a distribution determined by a common underlying set point, then fitting the complete data set with a mixed model in which animal identity is included as a random effect should reconcile the apparent differences between animals ([Figure 4B](#fig4)). In this scenario, the conditional R^2^ estimated from the mixed model, in other words, the estimate of variance explained by animal identity and location, should be similar to the marginal R^2^ value, which indicates the variance explained by location only. By contrast, if differences between animals contribute to experimental variability, the mixed model should predict different fitting parameters for each animal, and the estimated conditional R^2^ should be greater than the corresponding marginal R^2^ ([Figure 4C](#fig4)).


chunk: Figure 4A—executable version
:::
## Inter-animal variability and dependence on environment of intrinsic properties of stellate cells.

(**A**) Examples of rheobase as a function of dorsoventral position for two mice.

```{r}
#' @width 12
#' @height 6

(rheo_example <- ggplot(filter(data.sc, id == "mouse_20131106" | id == "mouse_20140113"), aes(dvloc, rheo)) +
  geom_point(aes(colour = id), shape = 3) +
  labs(x = "Location (µm)", y = "Rheobase (pA)", colour = "Mouse")) +
  theme(axis.title = element_text(size = 7),
          axis.text = element_text(size = 7),
          axis.title.x = element_text(vjust = 2.5),
          legend.text = element_text(size = 5),
          legend.title=element_text(size=5),
          legend.key.size = unit(0.2, "cm"),
          legend.position = "right",
          legend.key = element_rect(fill = "white", colour = "black")) +
  scale_colour_manual(labels = c("", ""), values = c("red", "blue"))

```
:::


chunk: Figure 4B,C,D
:::
## Inter-animal variability and dependence on environment of intrinsic properties of stellate cells.

(**B, C**) Each line is the fit of simulated data from a different subject for datasets in which there is no inter-subject variability (**B**) or in which intersubject variability is present (**C**). Fitting data from each subject independently with linear regression models suggests intersubject variation in both datasets (left). By contrast, after fitting mixed effect models (right) intersubject variation is no longer suggested for the dataset in which it is absent (**B**) but remains for the dataset in which it is present (**C**). (**D**) Each line is the fit of rheobase as a function of dorsoventral location for a single mouse. The fits were carried out independently for each mouse (left) or using a mixed effect model with mouse identity as a random effect (right).

```{r Figure4B, warning=FALSE, message=FALSE, fig.cap="B"}
#' @width 12
#' @height 4

# Generate simulated data

# Makes properties for each subject
make_real_vals <-
  function(n_subjects,
           intersd_int = 1,
           intersd_slope = 1) {
    tb <-
      tibble(
        id = 1:n_subjects,
        int = rnorm(n_subjects, 1, intersd_int),
        slope = rnorm(n_subjects, 1, intersd_slope)
      )
  }

# Make data for each subject
make_data <-
  function(subject_params,
           loc_n = 20,
           loc_max = 1,
           intrasd = 1) {
    total_points <- c(loc_n * count(subject_params))[[1]]
    tb <- tibble(
      id = rep(subject_params$id, loc_n),
      loc = rep(runif(total_points, 0, loc_max)),
      param = loc * subject_params$slope +
        subject_params$int + rnorm(total_points, 0, intrasd)
    )
  }

# Formats figures
var_gp_format <- function(gg) {
  gg <- gg +
    ylim(0, 3) +
    ylab("Feature") +
    xlab("Location") +
    theme(axis.text = element_blank())
}

# Generate data simulating dorsoventral organisation without inter-animal variability
without_var_gp <- make_real_vals(20, 0, 0)
without_var_gp_data <- make_data(without_var_gp, 20, 1, 0.5)

# Plot fits to each animal separately
without_var_gp_gg <-
  ggplot(without_var_gp_data, aes(loc, param, group = id)) +
  stat_smooth(geom = "line", method = lm, se = FALSE) +
  ylim(0, 3)


without_var_gp_gg <- var_gp_format(without_var_gp_gg) +
  theme(axis.title = element_text(size = 9),
        axis.title.x = element_text(vjust = 2)) +
  labs(x = "") +
  scale_x_continuous(
    labels = function(breaks) {
      rep_along(breaks, "")
    }
  )

# Predictions for fitting with a mixed effect model
without_var_gp_mm <-
  lmer(param ~ loc + (loc ||
                        id), data = without_var_gp_data, REML = FALSE)
mm_without_predictplot <-
  predict_plot_2(
    without_var_gp_data,
    without_var_gp_mm,
    without_var_gp_data$loc,
    without_var_gp_data$param
  )

mm_without_predictplot <- var_gp_format(mm_without_predictplot) +
  annotate(
    "text",
    x = 0.6,
    y = 0.6,
    label = "paste(italic(R) ^ 2, \"marginal = .31\")",
    parse = TRUE,
    size = 2
  ) +
  annotate(
    "text",
    x = 0.6,
    y = 0.2,
    label = "paste(italic(R) ^ 2, \"conditional = .31\")",
    parse = TRUE,
    size = 2
  )  +
  theme(axis.title = element_text(size = 9),
        axis.title.x = element_text(vjust = 2))


mm_without_predictplot <- mm_without_predictplot +
  labs(x = "", y = "")

plot_grid(without_var_gp_gg, mm_without_predictplot)


# Generate data simulating dorsoventral organisation with inter-animal variability
with_var_gp <- make_real_vals(20, 0.2, 0.2)
with_var_gp_data <- make_data(with_var_gp, 20, 1, 0.2)

# Plot fits to each animal separately
with_var_gp_gg <-
  ggplot(with_var_gp_data, aes(loc, param, group = id)) +
  stat_smooth(geom = "line", method = lm, se = FALSE)

with_var_gp_gg <- var_gp_format(with_var_gp_gg) +
  theme(axis.title = element_text(size = 9),
        axis.title.x = element_text(vjust = 2))

# Predictions for fitting with a mixed effect model
with_var_gp_mm <-
  lmer(param ~ loc + (loc ||
                        id), data = with_var_gp_data, REML = FALSE)

mm_with_predictplot <-
  predict_plot_2(with_var_gp_data,
                 with_var_gp_mm,
                 with_var_gp_data$loc,
                 with_var_gp_data$param)

mm_with_predictplot <- var_gp_format(mm_with_predictplot) +
  annotate(
    "text",
    x = 0.6,
    y = 0.6,
    label = "paste(italic(R) ^ 2, \"marginal = .49\")",
    parse = TRUE,
    size = 2
  ) +
  annotate(
    "text",
    x = 0.6,
    y = 0.2,
    label = "paste(italic(R) ^ 2, \"conditional = .73\")",
    parse = TRUE,
    size = 2
  ) +
  theme(axis.title = element_text(size = 9),
        axis.title.x = element_text(vjust = 2))

mm_with_predictplot <- mm_with_predictplot +
  labs(y = "")

plot_grid(with_var_gp_gg, mm_with_predictplot)


# The code below requires mixed effect models to have been fit to data in data.sc_r

# Rheobase predictions from fits with a standard linear model
rheo_predictplot <-  ggplot(filter(data.sc_r, property == "rheo")$data[[1]], aes(x = dvlocmm, y = value, group = id)) +
    stat_smooth(geom = "line", method = lm, se = FALSE)
rheo_predictplot <- gg_rheo_format(rheo_predictplot, 0, 600)

rheo_predictplot_mod <- rheo_predictplot +
    theme(axis.title = element_text(size = 9), axis.text = element_text(size = 7), axis.title.x = element_text(vjust = 2.5))


# Rheobase predictions from mixed model fits
mm_rheo_predictplot <- predict_plot(filter(data.sc_r, property == "rheo")$data[[1]],
                               filter(data.sc_r, property == "rheo")$mm_vsris[[1]])
mm_rheo_predictplot  <- gg_rheo_format(mm_rheo_predictplot, 0, 600)

mm_rheo_predictplot_mod <- mm_rheo_predictplot +
    annotate("text", x = 1, y = 130, label = "paste(italic(R) ^ 2, \"marginal = .38\")", parse = TRUE, size = 2) +
    annotate("text", x = 1, y = 50, label = "paste(italic(R) ^ 2, \"conditional = .65\")", parse = TRUE, size = 2) +
    theme(axis.title = element_text(size = 9), axis.text = element_text(size = 7), axis.title.x = element_text(vjust = 2.5))

mm_rheo_predictplot_mod <- mm_rheo_predictplot_mod +
  labs(y = "")

plot_grid(rheo_predictplot_mod, mm_rheo_predictplot_mod)
```
:::
{#fig4bcd}

chunk: Figure 4E
:::
## Inter-animal variability and dependence on environment of intrinsic properties of stellate cells.

(**E**) The intercept (I), sum of the intercept and slope (I + S), and slopes realigned to a common intercept (right) for each mouse obtained by fitting mixed effect models for each property as a function of dorsoventral position.

```{r Figure4E, message=FALSE, warning=FALSE, fig.cap="E"}
#' @width 25
#' @height 25

combined_intercepts_slopes <-
  prep_int_slopes(data.sc_r, "property", "mm_vsris")

id_housing <-  distinct(data.sc, id, housing, age, sex)
combined_intercepts_slopes <-
  left_join(combined_intercepts_slopes, id_housing, by = "id")

combined_intercepts_slopes$property_factors <-
  as.factor(combined_intercepts_slopes$property)

combined_intercepts_slopes$property_factors = factor(
  combined_intercepts_slopes$property_factors,
  c(
    "vm",
    "ir",
    "sag",
    "tau",
    "resf",
    "resmag",
    "rheo",
    "fi",
    "ahp",
    "spkmax",
    "spkthr",
    "spkhlf"
  )
)


labels_intercepts <-
  c(
    ahp = "AHP min. (mV)",
    fi = "F-I (Hz / pA)",
    ir = "IR (MΩ)",
    resf = "Res F (Hz)",
    resmag = "Res. mag.",
    rheo = "Rheobase (pA)",
    sag = "Sag",
    spkhlf = "Spike h-w (ms)",
    spkmax = "Spike max. (mV)",
    spkthr = "Spike thres. (mV)",
    tau = "Tm (ms)",
    vm = "Vrest (mV)"
  )

print(
    ggplot(
      combined_intercepts_slopes,
      aes(x = measure, y = value_1, colour = housing)
    ) +
    geom_line(aes(group = id)) +
    geom_jitter(aes(y = value_2), width = 0.2, alpha = 0.5) +
    #geom_boxplot(aes(y = value_2), alpha = 0.1) +
    stat_summary(
      aes(y = value_2),
      fun.y = "mean",
      fun.ymin = "mean",
      fun.ymax = "mean",
      size = 0.2,
      geom = "crossbar"
    ) +
    scale_x_discrete(
      limits = c(
        "ind_intercept",
        "ind_intercept_slope",
        "global_intercept",
        "global_intercept_slope"
      ),
      label = c("I", "I + S", "", "")
    ) +
    facet_wrap(
      ~ property_factors,
      scales = "free",
      labeller = labeller(property_factors = labels_intercepts)
    ) +
    theme_classic() +
    hist_theme +
    theme(
      axis.line.x = element_blank(),
      axis.ticks.x = element_blank(),
      legend.position = "bottom"
    )
)
```
:::
{#fig4e}


figure: Figure 4—figure supplement 1
:::
![](elife-52258.rmd.media/fig4-figsupp1.jpg)

## Properties of SCs in medial and lateral slices.

Membrane properties of SCs from slices containing more medial (blue) and more lateral (red) parts of the MEC plotted as a function of dorsal ventral position. Neurons from more medial slices had a higher spike threshold, a lower spike maximum and a less-negative spike after-hyperpolarization (see [Supplementary file 6](#supp6)). Properties are labelled as in [Figure 2](#fig2).
:::
{#fig4}


Fitting the experimental measures for each feature with mixed models suggests that differences between animals contribute substantially to the variability in properties of SCs. In contrast to simulated data in which inter-animal differences are absent ([Figure 4B](#fig4)), differences in fits between animals remained after fitting with the mixed model ([Figure 4D](#fig4)). This corresponds with expectations from fits to simulated data containing inter-animal variability ([Figure 4C](#fig4)). To visualize inter-animal variability for all measured features, we plot for each animal the intercept of the model fit (I), the predicted value at a location 1 mm ventral from the intercept (I+S), and the slope (lines) ([Figure 4E](#fig4)). Strikingly, even for features such as rheobase and input resistance (IR) that are highly tuned to a neurons’ dorsoventral position, the extent of variability between animals is similar to the extent to which the property changes between dorsal and mid-levels of the MEC.

If set points that determine integrative properties of SCs do indeed differ between animals, then mixed models should provide a better account of the data than linear models that are generated by pooling data across all animals. Consistent with this, we found that mixed models for all electrophysiological features gave a substantially better fit to the data than linear models that considered all neurons as independent (adjusted p&lt;2×10^−17^ for all models, χ^2^ test, [Table 1](#table1)). Furthermore, even for properties with substantial (R^2^ value >0.1) dorsoventral tuning, the conditional R^2^ value for the mixed effect model was substantially larger than the marginal R^2^ value ([Figure 4D](#fig4) and [Table 1](#table1)). Together, these analyses demonstrate inter-animal variability in key electrophysiological features of SCs, suggesting that the set points that establish the underlying integrative properties differ between animals.

**Table 1. Dependence of the electrophysiological features of SCs on dorsoventral position.**

Key statistical parameters from analyses of the relationship between each measured electrophysiological feature and dorsoventral location. The data are ordered according to the marginal R2 for each property’s relationship with dorsoventral position. Slope is the population slope from fitting a mixed effect model for each feature with location as a fixed effect (mm−1), with p(slope) obtained by comparing this model to a model without location as a fixed effect (χ2 test). For all properties except the spike thereshold, the slope was unlikely to have arisen by chance (p<0.05). The marginal and conditional R2 values, and the minimum and maximum slopes across all mice, are obtained from the fits of mixed effect models that contain location as a fixed effect. The estimate p(vs linear) is obtained by comparing the mixed effect model containing location as a fixed effect with a corresponding linear model without random effects (χ2 test). The values of p(slope) and p(vs linear) were adjusted for multiple comparisons using the method of Benjamini and Hochberg (1995). Units for the slope measurements are units for the property mm−1. The data are from 27 mice.

| Feature              | Slope    | P (slope) | Marginal R^2^ | Conditional R^2^ | Slope (min) | Slope (max) | P (vs linear) |
| -------------------- | -------- | --------- | ------------- | ---------------- | ----------- | ----------- | ------------- |
| IR (MΩ)              | 11.794   | 8.39e-17  | 0.383         | 0.532            | 9.630       | 14.262      | 4.33e-40      |
| Rheobase (pA)        | −119.887 | 9.07e-15  | 0.382         | 0.652            | −153.873    | −76.130     | 6.55e-43      |
| I-F slope (Hz/pA)    | 0.036    | 6.06e-10  | 0.228         | 0.561            | 0.019       | 0.087       | 6.82e-34      |
| Tm (ms)              | 2.646    | 3.70e-12  | 0.192         | 0.343            | 1.809       | 3.979       | 1.20e-29      |
| Res. frequency (Hz)  | −1.334   | 4.13e-09  | 0.122         | 0.553            | −2.299      | −0.342      | 6.37e-65      |
| Sag                  | 0.033    | 6.06e-10  | 0.121         | 0.347            | 0.016       | 0.043       | 1.91e-38      |
| Spike maximum (mV)   | 1.900    | 1.85e-05  | 0.064         | 0.436            | −1.288      | 3.297       | 1.14e-50      |
| Res. magnitude       | −0.114   | 6.34e-08  | 0.064         | 0.198            | −0.138      | −0.087      | 9.13e-20      |
| Vm (mV)              | −0.884   | 3.67e-05  | 0.046         | 0.348            | −1.965      | 0.150       | 8.73e-35      |
| Spike AHP (mV)       | −0.645   | 1.93e-02  | 0.011         | 0.257            | −1.828      | 0.408       | 1.82e-17      |
| Spike width (ms)     | 0.017    | 1.93e-02  | 0.010         | 0.643            | −0.021      | 0.055       | 7.04e-139     |
| Spike threshold (mV) | 0.082    | 8.20e-01  | 0.000         | 0.510            | −2.468      | 2.380       | 2.03e-17      |

## Experience-dependence of intrinsic properties of stellate cells

Because neuronal integrative properties may be modified by changes in neural activity (@bib85), we asked whether experience influences the measured electrophysiological features of SCs. We reasoned that modifying the space through which animals can navigate may drive experience-dependent plasticity in the MEC. As standard mouse housing has dimensions less than the distance between the firing fields of more ventrally located grid cells (@bib12; @bib42), in a standard home cage, only a relatively small fraction of ventral grid cells is likely to be activated, whereas larger housing should lead to the activation of a greater proportion of ventral grid cells. We therefore tested whether the electrophysiological features of SCs differ between mice housed in larger environments (28,800 cm^2^) and those with standard home cages (740 cm^2^).

We compared the mixed models described above to models in which housing was also included as a fixed effect. To minimize the effects of age on SCs (@bib10; @bib16; [Supplementary file 2](#supp2)), we focused these and subsequent analyses on mice between P33 and P44 (N = 25, n = 779). We found that larger housing was associated with a smaller sag coefficient, indicating an increased sag response, a lower resonant frequency and a larger spike half-width (adjusted p&lt;0.05; [Figure 4E](#fig4), [Supplementary file 3](#supp3)). These differences were primarily from changes to the magnitude rather than the location-dependence of each feature. Other electrophysiological features appeared to be unaffected by housing.

To determine whether inter-animal differences remain after accounting for housing, we compared mixed models that include dorsoventral location and housing as fixed effects with equivalent linear regression models in which individual animals were not accounted for. Mixed models incorporating animal identity continued to provide a better account of the data, both for features that were dependent on housing (adjusted p&lt;2.8×10^−21^) and for features that were not (adjusted p&lt;1.4×10^−7^) ([Supplementary file 4](#supp4)).

Together, these data suggest that specific electrophysiological features of SCs may be modified by experience of large environments. After accounting for housing, significant inter-animal variation remains, suggesting that additional mechanisms acting at the level of animals rather than individual neurons also determine differences between SCs.

## Inter-animal differences remain after accounting for additional experimental parameters

To address the possibility that other experimental or biological variables could contribute to inter-animal differences, we evaluated the effects of home cage size ([Supplementary files 3](#supp3)–[4](#supp4)), brain hemisphere ([Supplementary file 5](#supp5)), mediolateral position ([Figure 4—figure supplement 1](#fig4s1) and [Supplementary file 6](#supp6)), the identity of the experimenter ([Supplementary file 7](#supp7)) and time since slice preparation ([Supplementary files 8](#supp8) and [9](#supp9)). Several of the variables influenced some measured electrophysiological features, for example properties primarily related to the action potential waveform depended on the mediolateral position of the recorded neuron ([Supplementary file 6](#supp6); @bib18; @bib83), but significant inter-animal differences remained after accounting for each variable. We carried out further analyses using models that included housing, mediolateral position, experimenter identity and the direction in which sequential recordings were obtained as fixed effects ([Supplementary file 10](#supp10)), and using models fit to minimal datasets in which housing, mediolateral position and the recording direction were identical ([Supplementary file 11](#supp11)). These analyses again found evidence for significant inter-animal differences.

Inter-animal differences could arise if the health of the recorded neurons differed between brain slices. To minimize this possibility, we standardized our procedures for tissue preparation (see 'Materials and methods'), such that slices were of consistent high quality as assessed by low numbers of unhealthy cells and by visualization of soma and dendrites of neurons in the slice. Several further observations are consistent with comparable quality of slices between experiments. First, if the condition of the slices had differed substantially between animals, then in better quality slices, it should be easier to record from more neurons, in which case features that depend on tissue quality would correlate with the number of recorded neurons. However, the majority (10/12) of the electrophysiological features were not significantly (p>0.2) associated with the number of recorded neurons ([Supplementary file 12](#supp12)). Second, analyses of inter-animal differences that focus only on data from animals for which >35 recordings were made, which should only be feasible with uniformly high-quality brain slices, are consistent with conclusions from analysis of the larger dataset ([Supplementary file 13](#supp13)). Third, the conditional R^2^ values of electrophysiological features of L2PCs are much lower than those for SCs recorded under the same experimental conditions ([Table 1](#table1) and [Supplementary file 1](#supp1)), suggesting that inter-animal variation may be specific to SCs and cannot be explained by slice conditions. Together, these analyses indicate that differences between animals remain after accounting for experimental and technical factors that might contribute to variation in the measured features of SCs.

## The distribution of intrinsic properties is consistent with a continuous rather than a modular organization

The dorsoventral organization of SC integrative properties is well established, but whether this results from within animal variation consistent with a small number of discrete set points that underlie a modular organization ([Figure 1B](#fig1)) is unclear. To evaluate modularity, we used datasets with n ≥ 34 SCs (N = 15 mice, median age = 37 days, age range = 18–43 days). We focus initially on rheobase, which is the property with the strongest correlation to dorsoventral location, and resonant frequency, which is related to the oscillatory dynamics underlying dorsoventral tuning in some models of grid firing (e.g. @bib14; @bib30). For n ≥ 34 SCs, we expect that if properties are modular, then this would be detected by the modified gap statistic in at least 50% of animals ([Figure 1—figure supplements 2C](#fig1s2) and [3](#fig1s3)). By contrast, we find that for datasets from the majority of animals, the modified gap statistic identifies only a single mode in the distribution of rheobase values ([Figure 5A](#fig5) and [Figure 6](#fig6)) (N = 13/15) and of resonant frequencies ([Figure 5B](#fig5) and [Figure 6](#fig6)) (N = 14/15), indicating that these properties have a continuous rather than a modular distribution. Consistent with this, smoothed distributions did not show clearly separated peaks for either property ([Figure 5](#fig5)). The mean and 95% confidence interval for the probability of evaluating a dataset as clustered (p~detect~) was 0.133 and 0.02–0.4 for rheobase and 0.067 and 0.002–0.32 for resonant frequency. These values of p~detect~ were not significantly different from the proportions expected given the false positive rate of 0.1 in the complete absence of clustering (p=0.28 and 0.66, binomial test). Thus, the rheobase and resonant frequency of SCs, although depending strongly on a neuron’s dorsoventral position, do not have a detectable modular organization.

figure: Figure 5
:::
![](elife-52258.rmd.media/fig5.jpg)

## Rheobase and resonant frequency do not have a detectable modular organization.

(**A, B**) Rheobase (**A**) and resonant frequency (**B**) are plotted as a function of dorsoventral position separately for each animal. Marker color and type indicate the age and housing conditions of the animal (‘+’ standard housing, ‘x’ large housing). KSDs (arbitrary bandwidth, same for all animals) are also shown. The number of clusters in the data (k~est~) is indicated for each animal (${\displaystyle \hat{\mathrm{k}}}$).
:::
{#fig5}


chunk: Figure 6
:::
## Significant modularity is not detectable for any measured property.

(**A, B**) Histograms showing the k~est ~(${\displaystyle \hat{\mathrm{k}}}$) counts across all mice for each different measured sub-threshold (**A**) and supra-threshold (**B**) intrinsic property. The maximum k evaluated was 8. The proportion of each measured property with k~est~>1 was compared using binomial tests (with N = 15) to the proportion expected if the underlying distribution of that property is always clustered with all separation between modes ≥5 standard deviations (pink text), or if the underlying distribution of the property is uniform (purple text). For all measured properties, the values of k~est~ (${\displaystyle \hat{\mathrm{k}}}$) were indistinguishable (p>0.05) from the data generated from a uniform distribution and differed from the data generated from a multi-modal distribution (p&lt;1×10^−6^).

```{r Figure6A, fig.cap="A"}
#' @width 25
#' @height 10

# Note that reference data in GapData is generated through simulation. It was re-generated for the executable manuscript and so plots may look slightly different to the original manuscript. The conclusions are robust to the seeds for data generation. The reference data is loaded/generated with the setup code above (see block: "Clustering Gap Data Setup").

# Evaluate the stellate cell (sc) data for k_est
fname <- "Data/datatable_sc.txt"
sc_data <-
  read_tsv(
    fname,
    col_types = cols(
      .default = "d",
      mlpos = "c",
      hemi = "c",
      id = "c",
      housing = "c",
      expr = "c",
      patchdir = "c"
    )
  )

sc_data_long <-
  sc_data %>% filter(totalcells >= 30) %>% select(property_names, id, dvloc) %>% gather(key =
                                                                                          "property",
                                                                                        value = "measurement",
                                                                                        property_names,
                                                                                        na.rm = TRUE)

# Initialise list to store tibbles for cluster evaluation of each combination of animal
# and measured property
sc_cluster_data_list <- list()
list_ind <- 0L # To track the list index inside the double loop

for (m in unique(sc_data_long$id)) {
  for (p in property_names) {
    # The data for one measured property for a single animal
    data <- sc_data_long %>% filter(id == m, property == p)
    
    # Use the fitted parameters to obtain thresholds and dispersions. Note that where there
    # are measurement NAs in the original data, they will have already been removed, so the
    # fits are for the right dataset size.
    thresholds <- get_thresh_from_params(data, k.max,
                                         threshold_fit_results$thresh_params[[1]])
    dispersions <- get_dispersions_from_params(data, k.max,
                                               dispersion_fit_results$dispersion_params[[1]])
    
    # The clustering evaluation outputs
    cluster_out <-
      get_k_est(
        data$measurement,
        kmeans,
        iter.max = iter.max,
        nstart = nstart,
        K.max = k.max,
        B = NULL,
        d.power = d.power,
        thresholds = thresholds,
        dispersions = dispersions
      )
    
    # Construct tibble that combines cluster evaluation output with other data in long format
    list_ind <- list_ind + 1 # Increment list index counter
    dimnames(cluster_out$cluster)[[2]] <-
      as.integer(1:k.max) # to set 'as.tibble()' behaviour
    sc_cluster_data_list[[list_ind]] <-
      as.tibble(cluster_out$cluster) %>% bind_cols(data) %>%
      mutate(k_est = rep(cluster_out$k_est, nrow(cluster_out$cluster))) %>%
      gather(key = "k_eval",
             value = "cluster",
             dimnames(cluster_out$cluster)[[2]])
  }
}
# Aggregated data in long format. Most granular level is single measurement cluster membership
# for a specified k. This is the main input to further SC data analyses.
sc_cluster_data_long <- bind_rows(sc_cluster_data_list) %>%
  mutate(k_eval = as.integer(k_eval), cluster = as.factor(cluster)) %>% # Change class for plotting
  mutate(prop_type = case_when(property %in% props_subthresh ~ "subthr", TRUE ~ "suprathr"))

# Plot k_est counts for subthreshold and suprathreshold properties
(
  clusters_a <-
    sc_cluster_data_long %>% filter(prop_type == "subthr") %>% distinct(id, property, k_est) %>%
    ggplot() + geom_bar(aes(x = k_est)) +
    facet_wrap(
      vars(property),
      ncol = 6,
      labeller = as_labeller(props_sub_labels)
    ) +
    labs(y = "Count", x = " ") +
    scale_x_continuous(
      labels = c(rep(" ", k.max)),
      breaks = 1:k.max,
      limits = c(0.5, k.max + 0.5)
    )
)

# Required Figure 6A to gave been generated first
(
  clusters_b <-
    sc_cluster_data_long %>% filter(prop_type == "suprathr") %>%
    distinct(id, property, k_est) %>%
    ggplot() + geom_bar(aes(x = k_est)) +
    facet_wrap(
      vars(property),
      ncol = 6,
      labeller = as_labeller(props_supra_labels)
    ) +
    labs(
      x = bquote( ~ "Estimated number of clusters, " ~ k[est]),
      y = "Count"
    ) +
    scale_x_continuous(
      labels = c("1", rep("", k.max - 2), toString(k.max)),
      breaks = 1:k.max,
      limits = c(0.5, k.max + 0.5)
    )
)
```
:::
{#fig6}


When we investigated the other measured integrative properties, we also failed to find evidence for modularity. Across all properties, for any given property, at most 3 out of 15 mice were evaluated as having a clustered organization using the modified gap statistic ([Figure 6](#fig6)). This does not differ significantly from the proportion expected by chance when no modularity is present (p>0.05, binomial test). Consistent with this, the average proportion of datasets evaluated as modular across all measured features was 0.072 ± 0.02 (± SEM), which is similar to the expected false-positive rate. By contrast, the properties of grid firing fields previously recorded with tetrodes in behaving animals (@bib71) were detected as having a modular organization using the modified gap statistic ([Figure 1—figure supplement 5](#fig1s5)). For seven grid-cell datasets with n ≥ 20, the mean for p~detect~ is 0.86, with 95% confidence intervals of 0.42 to 0.996. We note here that discontinuity algorithms that were previously used to assess the modularity of grid field properties (@bib32; @bib71) did indicate significant modularity in the majority of the intrinsic properties measured in our dataset (N = 13/15 and N = 12/15, respectively), but this was attributable to false positives resulting from the relatively even sampling of recording locations (see [Figure 1—figure supplement 4A](#fig1s4)). Therefore, we conclude that it is unlikely that any of the intrinsic integrative properties of SCs that we examined have organization within individual animals resembling the modular organization of grid cells in behaving animals.

## Multiple sources of variance contribute to diversity in stellate cell intrinsic properties

Finally, because many of the measured electrophysiological features of SCs emerge from shared ionic mechanisms (@bib21; @bib28; @bib58), we asked whether dorsoventral tuning reflects a single core mechanism and whether inter-animal differences are specific to this mechanism or manifest more generally.

Estimation of conditional independence for measurements at the level of individual neurons ([Figure 7A](#fig7)) or individual animals ([Figure 7B](#fig7)) was consistent with the expectation that particular classes of membrane ion channels influence multiple electrophysiologically measured features. The first five dimensions of a principal components analysis (PCA) of all measured electrophysiological features accounted for almost 80% of the variance ([Figure 7C](#fig7)). Examination of the rotations used to generate the principal components suggested relationships between individual features that are consistent with our evaluation of the conditional independence structure of the measured features ([Figure 7D and A](#fig7)). When we fit the principal components using mixed models with location as a fixed effect and animal identity as a random effect, we found that the first two components depended significantly on dorsoventral location ([Figure 7E](#fig7) and [Supplementary file 14](#supp14)) (marginal R^2^ = 0.50 and 0.09 and adjusted p=1.09×10^−15^ and 1.05 × 10^−4^, respectively). Thus, the dependence of multiple electrophysiological features on dorsoventral position may be reducible to two core mechanisms that together account for much of the variability between SCs in their intrinsic electrophysiology.


chunk: Figure 7A
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**A**) Partial correlations between the electrophysiological features investigated at the level of individual neurons. Partial correlations outside of the 95% basic bootstrap confidence intervals are color coded. Non-significant correlations are colored white. Properties shown are the resting membrane potential (Vm), input resistance (IR), membrane potential sag response (sag), membrane time constant (Tm), resonance frequency (Rm), resonance magnitude (Rm), rheobase (Rheo), slope of the current frequency relationship (FI), peak of the action potential after hyperpolarization (AHP), peak of the action potential (Smax) voltage threshold for the action potential (Sthr) and half-width of the action potential (SHW).

```{r Figure7A, warning=FALSE, fig.cap="A"}
#' @width 8
#' @height 8

data.sc_neurons <- data.sc %>% dplyr::select(vm:fi) %>%
  na.omit
Q_neurons <- calcQ(data.sc_neurons)

Q_neurons <- as.matrix(Q_neurons)
colnames(Q_neurons) <- colnames(data.sc_neurons)
rownames(Q_neurons) <- colnames(data.sc_neurons)

order_names <-
  c(
    "vm",
    "ir",
    "sag",
    "tau",
    "resf",
    "resmag",
    "rheo",
    "fi",
    "ahp",
    "spkmax",
    "spkthr",
    "spkhlf"
  )

Q_neurons <-
  Q_neurons[match(order_names, rownames(Q_neurons)), match(order_names, colnames(Q_neurons))]

new_names <-
  c("Vm",
    "IR",
    "Sag",
    "Tm",
    "ResF",
    "ResM",
    "Rheo",
    "F-I",
    "AHP",
    "Smax",
    "Sthr",
    "SHW")

colnames(Q_neurons) <- new_names
rownames(Q_neurons) <- new_names

(
  Q_neurons_plot <-
    ggcorr(
      data = NULL,
      cor_matrix = Q_neurons,
      geom = "circle",
      min_size = 0,
      max_size = 5,
      legend.size = 7,
      size = 2
    )
)
```
:::
{#fig7a}


chunk: Figure 7B
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**B**) Partial correlations between the electrophysiological features investigated at the level of animals. Partial correlations outside of the 95% basic bootstrap confidence intervals are color coded. Non-significant correlations are colored white. Properties shown are the resting membrane potential (Vm), input resistance (IR), membrane potential sag response (sag), membrane time constant (Tm), resonance frequency (Rm), resonance magnitude (Rm), rheobase (Rheo), slope of the current frequency relationship (FI), peak of the action potential after hyperpolarization (AHP), peak of the action potential (Smax) voltage threshold for the action potential (Sthr) and half-width of the action potential (SHW).

```{r Figure7B, warning=FALSE, fig.cap="B"}
#' @width 8
#' @height 8

data_summary.mm_vsris <- prep_int_slopes_PCA(data.sc_r, "property", "mm_vsris")

data_intercepts <-  spread(data_summary.mm_vsris[,1:3], property, ind_intercept)
data_slopes <-  spread(data_summary.mm_vsris[,c(1:2, 4)], property, ind_slope)


data.sc_fits <- data_intercepts %>% dplyr::select(ahp:vm) %>%
  na.omit

Q_intercepts <- calcQ(data.sc_fits)

Q_intercepts <- as.matrix(Q_intercepts)
colnames(Q_intercepts) <- colnames(data.sc_fits)
rownames(Q_intercepts) <- colnames(data.sc_fits)

Q_intercepts <- Q_intercepts[match(order_names, rownames(Q_intercepts)), match(order_names, colnames(Q_intercepts))]

colnames(Q_intercepts) <- new_names
rownames(Q_intercepts) <- new_names

(Q_intercepts_plot <- ggcorr(data = NULL, cor_matrix = Q_intercepts, geom = "circle", min_size = 0, max_size = 5, legend.size = 7, size = 2))

```
:::
{#fig7b}


chunk: Figure 7C
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**C**) Proportion of variance explained by each principal component. To remove variance caused by animal age and the experimenter identity, the principal component analysis used a reduced dataset obtained by one experimenter and restricted to animals between 32 and 45 days old (N = 25, n = 572).

```{r Figure7C, fig.cap="C"}
#' @width 8
#' @height 8

data.pca <- dplyr::select(data.sc, vm:fi, dvlocmm, id, housing, id, mlpos, hemi, sex, age, housing, expr, patchdir, rectime) %>%
  filter(age > 32 & age <45 & expr == "HP")

all.pca <- prcomp(drop_na(data.pca[1:12], fi),
                  retx = TRUE,
                  scale = TRUE)

prop.var.df <- as.data.frame(summary(all.pca)$importance[2,])
colnames(prop.var.df) <- c("PropVar")
prop.var.df$components <- 1:12
(all.pca.prop.var.plot <- ggplot(prop.var.df, aes(components, PropVar)) +
  geom_bar(stat = "identity") +
  scale_x_discrete("Component") +
  ylab("Proportion of variance") +
    theme(title = element_text(size = 9), axis.text = element_text(size = 9)))
```
:::
{#fig7c}


chunk: Figure 7D
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**D**) Loading plot for the first two principal components.

```{r Figure7D, fig.cap="D"}
#' @width 8
#' @height 8

# Requires Figure 7C to have been generated.
(all.pca.biplot <- pca_biplot(as.data.frame(all.pca$rotation), fontsize = 2, order_names = order_names, new_names = c("Vm", "IR", "Sag", "Tm", "Res. F", "Res. Mag.", "Rheo", "FI", "AHPmax", "Spk. max", "Spk. thr", "Spk. HW")) +
    theme(text = element_text(size = 9)) +
    xlim(-0.5, 0.55))
```
:::
{#fig7d}


chunk: Figure 7E
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**E**) The first five principal components plotted as a function of position. (**F**) Intercept (I), intercept plus the slope (I + S) and slopes aligned to the same intercept, for fits for each animal of the first five principal components to a mixed model with location as a fixed effect and animal as a random effect.

```{r Figure7E, fig.cap="E"}
#' @width 13
#' @height 7

all.pca.x <- bind_cols(as_tibble(all.pca$x), drop_na(data.pca, fi))

out.pca.x_g1_5 <- all.pca.x %>%
  gather("component", "value", 1:5)
(
  pc1to5_plot <-
    ggplot(data = out.pca.x_g1_5, aes(
      x = dvlocmm, y = value, colour = housing
    )) +
    geom_point(alpha = 0.5) +
    facet_wrap(~ component, ncol = 5) +
    scale_x_continuous("DV location (mm)", c(0, 1, 2)) +
    theme_classic() +
    PCA_theme +
    theme(legend.position = "bottom")
)
```
:::
{#fig7e}


chunk: Figure 7F
:::
## Feature relationships and inter-animal variability after reducing dimensionality of the data.

(**F**) Intercept (I), intercept plus the slope (I + S) and slopes aligned to the same intercept, for fits for each animal of the first five principal components to a mixed model with location as a fixed effect and animal as a random effect.

```{r Figure7F, warning=FALSE, fig.cap="F"}
#' @width 13
#' @height 6

# Reform data for use with dplyr.
out.pca.x_g <- all.pca.x %>%
  gather("component", "value", 1:12) %>%
  group_by(component) %>%
  nest() %>%
  ungroup()

# Fit models
out.pca.x_g <- out.pca.x_g %>%
  mutate(mixedmodel_vsris = map(data, model_vsris)) %>%
  mutate(mixedmodel_vsris_null = map(data, model_vsris_null))

# Extract summary data from the fit
out.pca.x_g <- out.pca.x_g %>%
  mutate(vsris_glance = map(mixedmodel_vsris, broom::glance)) %>%
  mutate(vsris_null_glance = map(mixedmodel_vsris_null, broom::glance))

# Preparation for making the figure
combined_intercepts_slopes_PCA <-
  prep_int_slopes(out.pca.x_g, "component", "mixedmodel_vsris")

id_housing_PCA <-  distinct(all.pca.x, id, housing)

combined_intercepts_slopes_PCA <-
  left_join(combined_intercepts_slopes_PCA, id_housing_PCA, by = "id")

combined_intercepts_slopes_PCA$component_factors <-
  as.factor(combined_intercepts_slopes_PCA$component)

combined_intercepts_slopes_PCA$component_factors = factor(
  combined_intercepts_slopes_PCA$component_factors,
  c(
    "PC1",
    "PC2",
    "PC3",
    "PC4",
    "PC5",
    "PC6",
    "PC7",
    "PC8",
    "PC9",
    "PC10",
    "PC11"
  )
)

# Plot the figure
print(
  ggplot(
    subset(
      combined_intercepts_slopes_PCA,
      component %in% c("PC1", "PC2", "PC3", "PC4", "PC5")
    ),
    aes(x = measure, y = value_1, colour = housing)
  ) +
  geom_line(aes(group = id)) +
  geom_jitter(aes(y = value_2), width = 0.2) +
  scale_x_discrete(
    limits = c(
      "ind_intercept",
      "ind_intercept_slope",
      "global_intercept",
      "global_intercept_slope"
    ),
    label = c("I", "I + S", "", "")
  ) +
  facet_wrap( ~ component_factors, ncol = 5) +
  theme_classic() +
  hist_theme +
  theme(
    axis.line.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.x = element_text(angle = 90)
  ) +
  theme(legend.position = "none")
)
```
:::
{#fig7f}


Is inter-animal variation present in PCA dimensions that account for dorsoventral variation? The intercept, but not the slope of the dependence of the first two principal components on dorsoventral position depended on housing (adjusted p=0.039 and 0.027) ([Figure 7E,F](#fig7) and [Supplementary file 15](#supp15)). After accounting for housing, the first two principal components were still better fit by models that include animal identity as a random effect (adjusted p=3.3×10^−9^ and 4.1 × 10^−86^), indicating remaining inter-animal differences in these components ([Supplementary file 16](#supp16)). A further nine of the next ten higher-order principal components did not depend on housing (adjusted p>0.1) ([Supplementary file 15](#supp15)), while eight differed significantly between animals (adjusted p&lt;0.05) ([Supplementary file 16](#supp16)).

Together, these analyses indicate that the dorsoventral organization of multiple electrophysiological features of SCs is captured by two principal components, suggesting two main sources of variance, both of which are dependent on experience. Significant inter-animal variation in the major sources of variance remains after accounting for experience and experimental parameters.

# Discussion

Phenotypic variation is found across many areas of biology (@bib29), but has received little attention in investigations of mammalian nervous systems. We find unexpected inter-animal variability in SC properties, suggesting that the integrative properties of neurons are determined by set points that differ between animals and are controlled at a circuit level ([Figure 8](#fig8)). Continuous, location-dependent organization of set points for SC integrative properties provides new constraints on models for grid cell firing. More generally, the existence of inter-animal differences in set points has implications for experimental design and raises new questions about how the integrative properties of neurons are specified.

figure: Figure 8
:::
![](elife-52258.rmd.media/fig8.jpg)

## Models for intra- and inter-animal variation.

(**A**) The configuration of a cell type can be conceived of as a trough (arrow head) in a developmental landscape (solid line). In this scheme, the trough can be considered as a set point. Differences between cells (filled circles) reflect variation away from the set point. (**B**) Neurons from different animals or located at different dorsoventral positions can be conceptualized as arising from different troughs in the developmental landscape. (**C**) Variation may reflect inter-animal differences in factors that establish gradients (upper left) and in factors that are uniformly distributed (lower left), combining to generate set points that depend on animal identity and location (right). Each line corresponds to schematized properties of a single animal.
:::
{#fig8}


## A conceptual framework for within cell type variability

Theoretical models suggest how different cell types can be generated by varying target concentrations of intracellular Ca^2+^ or rates of ion channel expression (@bib56). The within cell type variability predicted by these models arises from different initial conditions and may explain the variability in our data between neurons from the same animal at the same dorsoventral location ([Figure 8A](#fig8)). By contrast, the dependence of integrative properties on position and their variation between animals implies additional mechanisms that operate at the circuit level ([Figure 8B](#fig8)). In principle, this variation could be accounted for by inter-animal differences in dorsoventrally tuned or spatially uniform factors that influence ion channel expression or target points for intracellular Ca^2+^ ([Figure 8C](#fig8)).

The mechanisms for within cell type variability that are suggested by our results may differ from inter-animal variation described in invertebrate nervous systems. In invertebrates, inter-animal variability is between properties of individual identified neurons (@bib36), whereas in mammalian nervous systems, neurons are not individually identifiable and the variation that we describe here is at the level of cell populations. From a developmental perspective in which cell identity is considered as a trough in a state-landscape through which each cell moves (@bib79), variation in the population of neurons of the same type could be conceived as cell autonomous deviations from set points corresponding to the trough ([Figure 8A](#fig8)). Our finding that variability among neurons of the same type manifests between as well as within animals, could be explained by differences between animals in the position of the trough or set point in the developmental landscape ([Figure 8B](#fig8)).

Our comparison of neurons from animals in standard and large cages provides evidence for the idea that within cell-type excitable properties are modified by experience (@bib85). For example, granule cells in the dentate gyrus that receive input from SCs increase their excitability when animals are housed in enriched environments (@bib38; @bib57). Our experiments differ in that we increased the size of the environment with the goal of activating more ventral grid cells, whereas previous enrichment experiments have focused on increasing the environmental complexity and availability of objects for exploration. Further investigation will be required to dissociate the influence of each factor on excitability.

## Implications of continuous dorsoventral organization of stellate cell integrative properties for grid cell firing

Dorsoventral gradients in the electrophysiological features of SCs have stimulated cellular models for the organization of grid firing (@bib15; @bib34; @bib39; @bib55; @bib81). Increases in spatial scale following deletion of HCN1 channels (@bib31), which in part determine the dorsoventral organization of SC integrative properties (@bib28; @bib35), support a relationship between the electrophysiological properties of SCs and grid cell spatial scales. Our data argue against models that explain this relationship through single cell computations (@bib15; @bib14; @bib30), as in this case, the modularity of integrative properties of SCs is required to generate modularity of grid firing. A continuous dorsoventral organization of the electrophysiological properties of SCs could support the modular grid firing generated by self-organizing maps (@bib39) or by synaptic learning mechanisms (@bib45; @bib76). It is less clear how a continuous gradient would affect the organization of grid firing predicted by continuous attractor network models, which can instead account for modularity by limiting synaptic interactions between modules (@bib13; @bib17; @bib26; @bib41; @bib70; @bib81; @bib82). Modularity of grid cell firing could also arise through the anatomical clustering of calbindin-positive L2PCs (@bib63; @bib64). Because many SCs do not appear to generate grid codes and as the most abundant functional cell type in the MEC appears to be non-grid spatial neurons (@bib20; @bib43), the continuous dorsoventral organization of SC integrative properties may also impact grid firing indirectly through modulation of these codes.

Our results add to previous comparisons of medially and laterally located SCs (@bib18; @bib83). The similar dorsoventral organization of subthreshold integrative properties of SCs from medial and lateral parts of the MEC appears consistent with the organization of grid cell modules recorded in behaving animals (@bib71). How mediolateral differences in firing properties ([Figure 4—figure supplement 1](#fig4s1); @bib18; @bib83) might contribute to spatial computations within the MEC is unclear.

The continuous dorsoventral variation of the electrophysiological features of SCs suggested by our analysis is consistent with continuous dorsoventral gradients in gene expression along layer 2 of the MEC (@bib62). For example, labelling of the mRNA and protein for the HCN1 ion channel suggests a continuous dorsoventral gradient in its expression (@bib53; @bib62). It is also consistent with single-cell RNA sequencing analysis of other brain areas, which indicates that although the expression profiles for some cell types cluster around a point in feature space, others lie along a continuum (@bib19). It will be interesting in future to determine whether gene expression continua establish corresponding continua of electrophysiological features (@bib47).

## Functional consequences of within cell type inter-animal variability

What are the functional roles of inter-animal variability? In the crab stomatogastric ganglion, inter-animal variation correlates with circuit performance (@bib36). Accordingly, variation in intrinsic properties of SCs might correlate with differences in grid firing (@bib22; @bib40; @bib66; @bib68) or behaviors that rely on SCs (@bib44; @bib61; @bib74). It is interesting in this respect that there appear to be inter-animal differences in the spatial scale of grid modules (Figure 5 of @bib71). Modification of grid field scaling following deletion of HCN1 channels is also consistent with this possibility (@bib31; @bib48). Alternatively, inter-animal differences may reflect multiple ways to achieve a common higher-order phenotype. According to this view, coding of spatial location by SCs would not differ between animals despite lower level variation in their intrinsic electrophysiological features. This is related to the idea of degeneracy at the level of single-cell electrophysiological properties (@bib49; @bib51; @bib56; @bib73), except that here the electrophysiological features differ between animals whereas the higher-order circuit computations may nevertheless be similar.

In conclusion, our results identify substantial within cell type variation in neuronal integrative properties that manifests between as well as within animals. This has implications for experimental design and model building as the distribution of replicates from the same animal will differ from those obtained from different animals (@bib50). An important future goal will be to distinguish causes of inter-animal variation. Many behaviors are characterized by substantial inter-animal variation (e.g. @bib77), which could result from variation in neuronal integrative properties, or could drive this variation. For example, it is possible that external factors such as social interactions may affect brain circuitry (@bib78; @bib80), although these effects appear to be focused on frontal cortical structures rather than circuits for spatial computations (@bib80). Alternatively, stochastic mechanisms operating at the population level may drive the emergence of inter-animal differences during the development of SCs (@bib23; @bib64). Addressing these questions may turn out to be critical to understanding the relationship between cellular biophysics and circuit-level computations in cognitive circuits (@bib69).

# Materials and methods

## Mouse strains

All experimental procedures were performed under a United Kingdom Home Office license and with approval of the University of Edinburgh’s animal welfare committee. Recordings of many SCs per animal used C57Bl/6J mice (Charles River). Recordings targeting calbindin cells used a _Wfs1_^Cre^ line (_Wfs1_-Tg3-CreERT2) obtained from Jackson Labs (Strain name: B6;C3-Tg(_Wfs1_-cre/ERT2)3Aibs/J; stock number:009103) crossed to RCE:loxP (R26R CAG-boosted EGFP) reporter mice (described in @bib52). To promote expression of Cre in the mice, tamoxifen (Sigma, 20 mg/ml in corn oil) was administered on three consecutive days by intraperitoneal injections, approximately 1 week before experiments. Mice were group housed in a 12 hr light/dark cycle with unrestricted access to food and water (light on 07.30–19.30 hr). Mice were usually housed in standard 0.2 × 0.37 m cages that contained a cardboard roll for enrichment. A subset of the mice was instead housed from pre-weaning ages in a larger 2.4 × 1.2 m cage that was enriched with up to 15 bright plastic objects and eight cardboard rolls ([Figure 2—figure supplement 1](#fig2s1)). Thus, the large cages had more items but at a slightly lower density (large cages — up to 1 item per 0.125 m^2^; standard cages — 1 item per 0.074 m^2^). All experiments in the standard cage used male mice. For experiments in the large cage, two mice were female, six mice were male and one was not identified. Because we did not find significant effects of sex on individual electrophysiologically properties, all mice were included in the analyses reported in the text. When only male mice were included, the effects of housing on the first principal component remained significant, whereas the effects of housing on individual electrophysiologically properties no longer reach statistical significance after correcting for multiple comparisons. Additional analyses that consider only male mice are provided in the code associated with the manuscript.

## Slice preparation

Methods for preparation of parasagittal brain slices containing medial entorhinal cortex were based on procedures described previously (@bib59). Briefly, mice were sacrificed by cervical dislocation and their brains carefully removed and placed in cold (2–4°C) modified ACSF, with composition (in mM): NaCl 86, NaH~2~PO~4~ 1.2, KCl 2.5, NaHCO~3~ 25, glucose 25, sucrose 75, CaCl~2~ 0.5, and MgCl~2~ 7. Brains were then hemisected and sectioned, also in modified ACSF at 4–8°C, using a vibratome (Leica VT1200S). To minimize variation in the slicing angle, the hemi-section was cut along the midline of the brain and the cut surface of the brain was glued to the cutting block. After cutting, brains were placed at 36°C for 30 min in standard ACSF, with composition (in mM): NaCl 124, NaH~2~PO4 1.2, KCl 2.5, NaHCO~3~ 25, glucose 20, CaCl~2~ 2, and MgCl~2~ 1. They were then allowed to cool passively to room temperature. All slices were prepared by the same experimenter (HP), who followed the same procedure on each day.

## Recording methods

Whole-cell patch-clamp recordings were made between 1 to 10 hr after slice preparation using procedures described previously (@bib60; @bib58; @bib59; @bib72). Recordings were made from slice perfused in standard ACSF maintained at 34–36°C. In these conditions, we observe spontaneous fast inhibitory and excitatory postsynaptic potentials, but do not find evidence for tonic GABAergic or glutamatergic currents. Patch electrodes were filled with the following intracellular solution (in mM): K gluconate 130; KCl 10, HEPES 10, MgCl~2~ 2, EGTA 0.1, Na~2~ATP 2, Na~2~GTP 0.3 and NaPhosphocreatine 10. The open tip resistance was 4–5 MΩ, all seal resistances were >2 GΩ and series resistances were &lt;30 MΩ. Recordings were made in current clamp mode using Multiclamp 700B amplifiers (Molecular Devices, Sunnyvale, CA, USA) connected to PCs via Instrutech ITC-18 interfaces (HEKA Elektronik, Lambrecht, Germany) and using Axograph X acquisition software (<http://axographx.com/>). Pipette capacitance and series resistances were compensated using the capacitance neutralization and bridge-balance amplifier controls. An experimentally measured liquid junction potential of 12.9 mV was not corrected for. Stellate cells were identified by their large sag response and the characteristic waveform of their action potential after hyperpolarization (see @bib2; @bib37; @bib53; @bib58).

To maximize the number of cells recorded per animal, we adopted two strategies. First, to minimize the time required to obtain data from each recorded cell, we measured electrophysiological features using a series of three short protocols following initiation of stable whole-cell recordings. We used responses to sub-threshold current steps to estimate passive membrane properties ([Figure 2A](#fig2)), a frequency modulated sinusoidal current waveform (ZAP waveform) to estimate impedance amplitude profiles ([Figure 2B](#fig2)), and a linear current ramp to estimate the action potential threshold and firing properties ([Figure 2C](#fig2)). From analysis of data obtained with these protocols, we obtained 12 quantitative measures that describe the sub- and supra-threshold integrative properties of each recorded cell ([Figure 2A–C](#fig2)). Second, for the majority of mice, two experimenters made recordings in parallel from neurons in two sagittal brain sections from the same hemisphere. The median dorsal-ventral extent of the recorded SCs was 1644 µm (range 0–2464 µm). Each experimenter aimed to sample recording locations evenly across the dorsoventral extent of the MEC, and for most animals, each experimenter recorded sequentially from opposite extremes of the dorsoventral axis. Each experimenter varied the starting location for recording between animals. For several features, the direction of recording affected their measured dependence on dorsoventral location, but the overall dependence of these features on dorsoventral location was robust to this effect ([Supplementary file 9](#supp9)).

## Measurement of electrophysiological features and neuronal location

Electrophysiological recordings were analyzed in Matlab (Mathworks) using a custom-written semi-automated pipeline. We defined each feature as follows (see also @bib53; @bib58): resting membrane potential was the mean of the membrane potential during the 1 s prior to injection of the current steps used to assess subthreshold properties; input resistance was the steady-state voltage response to the negative current steps divided by their amplitude; membrane time constant was the time constant of an exponential function fit to the initial phase of membrane potential responses to the negative current steps; the sag coefficient was the steady-state divided by the peak membrane potential response to the negative current steps; resonance frequency was the frequency at which the peak membrane potential impedance was found to occur; resonance magnitude was the ratio between the peak impedance and the impedance at a frequency of 1 Hz; action potential threshold was calculated from responses to positive current ramps as the membrane potential at which the first derivative of the membrane potential crossed 20 mv ms^−1^ averaged across the first five spikes, or fewer if fewer spikes were triggered; rheobase was the ramp current at the point when the threshold was crossed on the first spike; spike maximum was the mean peak amplitude of the action potentials triggered by the positive current ramp; spike width was the duration of the action potentials measured at the voltage corresponding to the midpoint between the spike threshold and spike maximum; the AHP minimum was the negative peak membrane potential of the after hyperpolarization following the first action potential when a second action potential also occurred; and the F-I slope was the linear slope of the relationship between the spike rate and the injected ramp current over a 500 ms window.

The location of each recorded neuron was estimated as described previously (@bib28; @bib59). Following each recording, a low magnification image was taken of the slice with the patch-clamp electrode at the recording location. The image was calibrated and then the distance measured from the dorsal border of the MEC along the border of layers 1 and 2 to the location of the recorded cell.

## Analysis of location-dependence, experience-dependence and inter-animal differences

Analyses of location-dependence and inter-animal differences used R 3.4.3 (R Core Team, Vienna, Austria) and R Studio 1.1.383 (R Studio Inc, Boston, MA).

To fit linear mixed effect models, we used the lme4 package (@bib8). Features of interest were included as fixed effects and animal identity was included as a random effect. All reported analyses are for models with the minimal a priori random effect structure, in other words the random effect was specified with uncorrelated slope and intercept. We also evaluated models in which only the intercept, or correlated intercept and slope were specified as the random effect. Model assessment was performed using Akaike Information Criterion (AIC) scores. In general, models with either random slope and intercept, or correlated random slope and intercept, had lower AIC scores than random intercept only models, indicating a better fit to the data. We used the former set of models for all analyses of all properties for consistency and because a maximal effect structure may be preferable on theoretical grounds (@bib5). We evaluated assumptions including linearity, normality, homoscedasticity and influential data points. For some features, we found modest deviations from these assumptions that could be remedied by handling non-linearity in the data using a copula transformation. Model fits were similar following transformation of the data. However, we focus here on analyses of the untransformed data because the physical interpretation of the resulting values for data points is clearer.

To evaluate the location-dependence of a given feature, p-values were calculated using a χ^2^ test comparing models to null models with no location information but identical random effect specification. To calculate marginal and conditional R^2^ of mixed effect models, we used the MuMin package (@bib7). To evaluate additional fixed effects, we used Type II Wald χ^2^ test tests provided by the car package (@bib25). To compare mixed effect with equivalent linear models, we used a χ^2^ test to compare the calculated deviance for each model. For clarity, we have reported key statistics in the main text and provide full test statistics in the Supplemental Tables. In addition, the code from which the analyses can be fully reproduced is available at <https://github.com/MattNolanLab/Inter_Intra_Variation> (@bib54; copy archived at <https://github.com/elifesciences-publications/Inter_Intra_Variation>).

To evaluate partial correlations between features, we used the function cor2pcor from the R package corpcor (@bib67). Principal components analysis used core R functions.

## Implementation of tests for modularity

To establish statistical tests to distinguish ‘modular’ from ‘continuous’ distributions given relatively few observations, we classified datasets as continuous or modular by modifying the gap statistic algorithm (@bib75). The gap statistic estimates the number of clusters (k~est~) that best account for the data in any given dataset ([Figure 1—figure supplement 1A-C](#fig1s1)). However, this estimate may be prone to false positives, particularly where the numbers of observations are low. We therefore introduced a thresholding mechanism for tuning the sensitivity of the algorithm so that the false-positive rate, which is the rate of misclassifying datasets drawn from continuous (uniform) distributions as ‘modular’, is low, constant across different numbers of cluster modes and insensitive to dataset size ([Figure 1—figure supplement 1D-G](#fig1s1)). With this approach, we are able to estimate whether a dataset is best described as lacking modularity (k~est~ = 1), or having a given number of modes (k~est~ > 1). Below, we describe tests carried out to validate the approach.

To illustrate the sensitivity and specificity of the modified gap statistic algorithm, we applied it to simulated datasets drawn either from a uniform distribution (k = 1, n = 40) or from a bimodal distribution with separation between the modes of five standard deviations (k = 2, n = 40, sigma = 5) ([Figure 1—figure supplement 2A](#fig1s2)). We set the thresholding mechanism so that k~est~ for each distinct k (where k~est~ ≥2) has a false-positive rate of 0.01. In line with this, testing for 2 ≤ k~est~ ≤ 8 (the maximum k expected to occur in grid spacing in the MEC), across multiple (N = 1000) simulated datasets drawn from the uniform distribution, produced a low false-positive rate (P(k~est~)≥2 = ~0.07), whereas when the data were drawn from the bimodal distribution, the ability to detect modular organization (p~detect~) was good (P\[k~est~]≥2 = ~0.8) ([Figure 1—figure supplement 2B](#fig1s2)). The performance of the statistic improved with larger separation between clusters and with greater numbers of data points per dataset ([Figure 1—figure supplement 2C](#fig1s2)) and is relatively insensitive to the numbers of clusters ([Figure 1—figure supplement 2D](#fig1s2)). The algorithm maintains high rates of p~detect~ when modes have varying densities and when sigma between modes varies in a manner similar to grid spacing data ([Figure 1—figure supplement 3](#fig1s3)).

## Analysis of extracellular recording data from other laboratories

Recently described algorithms (@bib32; @bib71) address the problem of identifying modularity when data are sampled from multiple locations and data values vary as a function of location, as is the case for the mean spacing of grid fields for cells at different dorsoventral locations recorded in behaving animals using tetrodes (@bib42). They generate log-normalized discontinuity (which we refer to here as lnDS) or discreteness scores, which are the log of the ratio of discontinuity or discreteness scores for the data points of interest and for the sampling locations, with positive values interpreted as evidence for clustering (@bib32; @bib71). However, in simulations of datasets generated from a uniform distribution with evenly spaced recording locations, we find that the lnDS is always greater than zero ([Figure 1—figure supplement 4A](#fig1s4)). This is because evenly spaced locations result in a discontinuity score that approaches zero, and therefore the log ratio of the discontinuity of the data to this score will be positive. Thus, for evenly spaced data, the lnDS is guaranteed to produce false-positive results. When locations are instead sampled from a uniform distribution, approximately half of the simulated datasets have a log discontinuity ratio greater than 0 ([Figure 1—figure supplement 4A](#fig1s4)), which in previous studies would be interpreted as evidence of modularity (@bib32). Similar discrepancies arise for the discreteness measure (@bib71). To address these issues, we introduced a log discontinuity ratio threshold, so that the discontinuity method could be matched to produce a similar false-positive rate to the adapted gap statistic algorithm used in the example above. After including this modification, we found that for a given false-positive rate, the adapted gap statistic is more sensitive at detecting modularity in the simulated datasets ([Figure 4—figure supplement 1B](#fig4s1)).

To establish whether the modified gap statistic detects clustering in experimental data, we applied it to previously published grid cell data recorded with tetrodes from awake behaving animals (@bib71). We find that the modified gap statistic identified clustered grid spacing for 6 of 7 animals previously identified as having grid modules and with n ≥ 20. For these animals, the number of modules was similar (but not always identical) to the number of previously identified modules ([Figure 1—figure supplement 5](#fig1s5)). By contrast, the modified gap statistic does not identify clustering in five of six sets of recording locations, confirming that the grid clustering is likely not a result of uneven sampling of locations (we could not test the seventh as location data were not available). The thresholded discontinuity score also detects clustering in the same five of the six tested sets of grid data. From the six grid datasets detected as clustered with the modified gap statistic, we estimated the separation between clusters by fitting the data with a mixture of Gaussians, with the number of modes set by the value of k obtained with the modified gap statistic. This analysis suggested that the largest spacing between contiguous modules in each mouse is always >5.6 standard deviations (mean = 20.5 ± 5.0 standard deviations). Thus, the modified gap statistic detects modularity within the grid system and indicates that previous descriptions of grid modularity are, in general, robust to the possibility of false positives associated with the discreteness and discontinuity methods.