Hierarchical Successive Removal

library(hbm4ecology)
library(rjags)
library(posterior)
library(tidyverse)
library(bayesplot)
library(GGally)
library(ggExtra)

The model rely on successive discrete (Binomial) distribution which the size parameter can get very low. This renders Normal approximation unreliable. Using Stan with Hamiltonian Monte Carlo would require to marginalize over the successive Binomial distribution. This requires some burdensome calculus. Hence we will use jags with a standard MCMC.

Data and motivation

We turn to the successive removal Salmon data from the Nivelle, a 40-km river flowing from the Spanish Pyren'{e}es to the bay of Saint Jean de Luz. Details about the fishing procedure can be found in Chapter 4.2 of Parent and Prévost (2013) (page 83). We work with a complete dataset involving 3 years (2003-2005) with 11 sites located from the Nivelle River mouth to impassable upstream dams on the main stream and one of its tributaries. For each of the 33 experiments the number of fish caught were recorded for the first (\(C^{1}\)), second (\(C^{2})\) and eventually third pass \((C^{3})\) of the electrofishing experiment. In addition, we provide the area \(S\) in \(m^{2}\) that was fished each time.

Electrofishing of 0+ A. salmon juveniles

data("SucRemNivelle")
tibble(Site = seq(1,33)) %>% 
  bind_cols(bind_rows(SucRemNivelle[2:5])) %>% 
  knitr::kable(caption = "Three passes during successive removal electrofishing 
               on the Nivelle River (11 sites, years 2003-2004; NA = Not Available")

Three passes during successive removal electrofishing on the Nivelle River (11 sites, years 2003-2004; NA = Not Available
Site	C1	C2	C3	S
1	23	14	3	608
2	35	7	NA	294
3	38	17	NA	565
4	41	28	15	564
5	17	8	NA	229
6	24	11	NA	200
7	63	21	NA	693
8	101	25	NA	571
9	78	17	NA	950
10	21	9	NA	430
11	51	16	NA	636
12	14	5	2	375
13	19	1	NA	441
14	14	7	2	590
15	37	4	NA	665
16	22	15	2	228
17	27	5	NA	236
18	53	8	4	599
19	32	4	NA	547
20	35	2	NA	870
21	37	7	NA	302
22	26	8	NA	611
23	24	16	4	413
24	36	12	NA	473
25	27	26	5	591
26	142	81	NA	1095
27	55	5	NA	254
28	22	2	NA	288
29	73	10	NA	667
30	104	28	NA	606
31	190	43	NA	712
32	16	2	NA	341
33	17	4	NA	632

Model with independence between sites

The sampling distribution of the catch data are for each site \(i\):

\(C^1_i \sim Binomial(\nu_i, \pi_i)\)
\(C^2_i \sim Binomial(\nu_i - C^1_i, \pi_i)\)
\(C^3_i \sim Binomial(\nu_i - (C^1_i+C^2_i), \pi_i)\)

Where we assume a Beta distribution for \(\pi_i\):

\(\pi_i \sim beta(a,b)\), where we fix \(a=b=1\) for uniform prior.

We consider that the population size \(\nu_i\) depends on the expected fish density \(\delta_i\) as the area \(S_i\) of each sampling site is recorded. And we assume the fish to be randomly distributed in space within an infinitely large surface from which the site with finite surface \(S\) is randomly picked. We further assign a prior distribution to the fish density \(\delta_i\), via a Gamma distribution.

\(\nu_i \sim Poisson(\delta_iS_i)\)
\(\delta_i \sim Gamma(c,d)\), with \(c=d=1\).

Directed Acyclic Graph for a site with the independent model

Jags implementation and diagnostic for the independtent model

ind_model_str <- "model
{

for (i in 1:I)
{

# Prior
# -----

# Prior on capture efficiency
    pi[i] ~ dbeta(1, 1)

# Prior on fish density
    delta[i] ~ dgamma(1, 1)

# Number of fish on each site i (surface S[i] in the data set)
    mu_nu[i] <- S[i] * delta[i]
    nu[i] ~ dpois(mu_nu[i]) T(,500)

# Likelihood
# ----------

    C1[i] ~ dbin(pi[i], nu[i])
    nur1[i] <- nu[i] - C1[i]
    C2[i] ~ dbin(pi[i], nur1[i])
    nur2[i] <- nur1[i] - C2[i]
    C3[i] ~ dbin(pi[i], nur2[i])

}

}"

set.seed(123)
ind_model <- jags.model(file = textConnection(ind_model_str), 
                    data = SucRemNivelle, 
                 #   inits = replicate(3, ind_init, simplify = FALSE),
                    n.chains = 3)
#> Warning in jags.model(file = textConnection(ind_model_str), data =
#> SucRemNivelle, : Unused variable "AI" in data
#> Warning in jags.model(file = textConnection(ind_model_str), data =
#> SucRemNivelle, : Unused variable "I_pred" in data
#> Warning in jags.model(file = textConnection(ind_model_str), data =
#> SucRemNivelle, : Unused variable "AI_pred" in data
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 74
#>    Unobserved stochastic nodes: 124
#>    Total graph size: 333
#> 
#> Initializing model

# Inferences
update(ind_model, n.iter = 10000)
ind_posterior_sample <- coda.samples(ind_model,
                       variable.names = c("pi", "delta", "nu"),
                       n.iter = 10000,
                       thin = 10)

diagnostic <- summarise_draws(as_draws_df(ind_posterior_sample),
                default_convergence_measures()) 
ggplot(diagnostic, aes(x = rhat)) +
  geom_histogram() +
  theme_classic(base_size = 15)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(diagnostic, aes(x = ess_bulk)) +
  geom_histogram() +
  theme_classic(base_size = 15)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We notice some convergence issues on sites with insufficient information (e.g. sites number 5, 10 or 26).

mcmc_trace(ind_posterior_sample, pars = c("delta[5]", "pi[5]", "delta[10]", "pi[10]"))

dfw_ind <- extract_wider(ind_posterior_sample)

Hierarchical modeling

We add a hierarchical additional layer that this the experiments together. We do so by adding prior distribution to \(a\), \(b\), \(c\) and \(d\). For the parameters \((a, b)\) we use a simple transformation by considering:

\(\mu_{a,b} = \frac{a}{a+b}\) with \(\mu_{a,b} \sim Beta(1.5,1.5)\), which is the mean of the Beta distribution.
\(u_{a,b} = a+b\) with \(\log(u_{a,b}) \sim Uniform(0,10)\) can be interpreted as a prior sample size that scales the variance of the Beta prior distribution.

We follow the same line of reasoning, a change of variable for parameters \(c\) and \(d\) with mean \(\mu_{\delta} = \frac{c}{d}\) and standard deviation \(\sigma_{\delta} = \frac{\sqrt{c}}{d}\). Prior knowledge about juvenile salmon density suggests and informative prior for \(\mu_{\delta}\) and a diffuse prior for \(\sigma_{\delta}\):

\(\mu_{\delta} \sim Gamma(1,1)\)
\(\sigma_{\delta} \sim Gamma(1, 10^{-3})\).

Directed Acyclic Graph of the hierarchical model for successive removals.

Jags implementation and diagnostic for the hierarchical model

hier_model_str <- "model
{

# Priors
# ------

# Hyperprior for the trapping efficiency
mu_p ~ dbeta(1.5 ,1.5) 
log_u_p ~ dunif(0,10)
u_p <- exp(log_u_p)

a <- mu_p * u_p
b <- (1-mu_p) * u_p

# Compute standard deviation of p
var_p <- mu_p*(1-mu_p)*(1/(u_p+1))
sd_p <- sqrt(var_p)

# Hyperpriror for the density
mu_d ~ dgamma(1,1)
sd_d ~ dgamma(1,0.001)

# Parameterization of the Gamma distribution
c <- pow(mu_d,2) / pow(sd_d,2)
d <- mu_d / pow(sd_d,2)

for (i in 1:I)
{

# Prior on capture efficiency
    pi[i] ~ dbeta(a,b)

# Prior on fish density
    delta[i] ~ dgamma(c,d)

# Number of fish (S[i] in the data set)
    mu_nu[i] <- S[i] * delta[i]
    nu[i] ~ dpois(mu_nu[i]) T(,500)

}

# Likelihood
# ----------

for (i in 1:I)
{
    C1[i] ~ dbin(pi[i], nu[i])
    nur1[i] <- nu[i] - C1[i]
    C2[i] ~ dbin(pi[i], nur1[i])
    nur2[i] <- nur1[i] - C2[i]
    C3[i] ~ dbin(pi[i], nur2[i])
}

}
"

set.seed(123)
hier_model <- jags.model(
  file = textConnection(hier_model_str), 
  data = SucRemNivelle,
  n.chains = 3)
#> Warning in jags.model(file = textConnection(hier_model_str), data =
#> SucRemNivelle, : Unused variable "AI" in data
#> Warning in jags.model(file = textConnection(hier_model_str), data =
#> SucRemNivelle, : Unused variable "I_pred" in data
#> Warning in jags.model(file = textConnection(hier_model_str), data =
#> SucRemNivelle, : Unused variable "AI_pred" in data
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 74
#>    Unobserved stochastic nodes: 128
#>    Total graph size: 354
#> 
#> Initializing model

# Inferences
update(hier_model, n.iter = 10000)
hier_posterior_sample <- coda.samples(
  hier_model,
  variable.names = c("a", "b", "c", "d", "mu_d", "mu_p", 
                     "pi", "delta", "nu"),
  n.iter = 10000,
  thin = 10)

summarise_draws(as_draws_df(hier_posterior_sample),
                default_convergence_measures()) %>%
  filter(variable %in% c("a", "b", "c", "d", "mu_d", "mu_p")) %>%  
  knitr::kable()

variable	rhat	ess_bulk	ess_tail
a	1.0000086	2207.324	2654.666
b	1.0001652	2283.140	2688.094
c	1.0000427	2800.646	2710.848
d	0.9999242	2875.790	2885.734
mu_d	0.9999451	2880.455	2597.036
mu_p	1.0006883	2502.721	2700.134

mcmc_trace(hier_posterior_sample, 
           pars = c("delta[5]", "pi[5]",
                    "delta[10]", "pi[10]"))

Analysis

dfw_hier <- extract_wider(hier_posterior_sample)

We draw the posterior distribution for the model parameters:

dfw_hier %>% 
  select("a", "b", "c", "d", "mu_d", "mu_p") %>% 
  GGally::ggpairs(
  columnLabels = c("a", "b", "c", "d", "mu[delta]", "mu[pi]"),
  labeller = "label_parsed"
  ) +
  theme_linedraw(base_size = 15L, base_line_size = 0) +
  theme(axis.ticks = element_line(size = 15/22))

The expected efficiency of capture \(\mu_{\pi}\) lies around \(0.7\) and the \(\pi\)’s are dispersed among sites approximately like a \(Beta(7,3)\). We get a good posterior knowledge about the expected density: \(\mu_{\delta}\) is about \(13\) individuals per 100\(m^{2}\). In addition, one obtains a rather small standard deviation \(\sigma_{\delta}\) of fish densities among sites but, locally, the densities \(\delta_{i}\) ’s are generally not precisely estimated.

The hierarchical model borrows strength from neighbors. Consequently the problem of banana-shaped joint distribution between \(\pi\) and \(\delta\) is reduced when recourse is made to hierarchical modeling. As an example, when considered solely, site 5 and 10 exhibit poor determination of \(\pi\) and \(\delta\) as shown in the figures below. The data at site 5 or 10 alone are insufficient to opt for a large capture efficiency \(\pi\) associated a small population size or for a small capture efficiency with a high local population density \(\delta\). Under the hierarchical model though, sufficient transfer of information between sites helps specify the probable joint domain where \(\pi\) and \(\delta\) are to be expected.

p <- dfw_ind %>% 
  select(`pi.5`, `delta.5`) %>% 
  ggplot(aes(`delta.5`, `pi.5`)) +
  geom_point(col = "red") + 
  geom_density_2d(col = "black") +
  theme_bw(base_size = 15L, base_line_size = 0) +
  ylim(c(0,1)) +
  theme(legend.position="none")+
  labs(x=expression(delta),
       y=expression(pi),
       title="Fish density vs Capture efficiency at site 5",
       subtitle = "Independent model") 
ggExtra::ggMarginal(p, type = "densigram", bins = 40)
#> Warning: Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins

p <- dfw_ind %>% 
  select(`pi.10`, `delta.10`) %>% 
  ggplot(aes(`delta.10`, `pi.10`)) +
  geom_point(col = "red") + 
  geom_density_2d(col = "black") +
  theme_bw(base_size = 15L, base_line_size = 0) +
  ylim(c(0,1)) +
  theme(legend.position="none")+
  labs(x=expression(delta),
       y=expression(pi),
       title="Fish density vs Capture efficiency at site 10",
       subtitle = "Independent model") 
ggExtra::ggMarginal(p, type = "densigram", bins = 40)
#> Warning: Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins

phier <- dfw_hier %>% 
  select(`pi.5`, `delta.5`) %>% 
  ggplot(aes(`delta.5`, `pi.5`)) +
  geom_point(col = "red") + 
  geom_density_2d(col = "black") +
    ylim(c(0,1)) + xlim(c(0,0.3)) +
  theme_bw(base_size = 15L, base_line_size = 0) +
  theme(legend.position="none")+
  labs(x=expression(delta),
       y=expression(pi),title="Fish density vs Capture efficiency at site 5",
       subtitle = "Hierarchical model") 
ggExtra::ggMarginal(phier, type = "densigram", bins = 40)
#> Warning: Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins

phier <- dfw_hier %>% 
  select(`pi.10`, `delta.10`) %>% 
  ggplot(aes(`delta.10`, `pi.10`)) +
  geom_point(col = "red") + 
  geom_density_2d(col = "black") +
    ylim(c(0,1)) + xlim(c(0,0.3)) +
  theme_bw(base_size = 15L, base_line_size = 0) +
  theme(legend.position="none")+
  labs(x=expression(delta),
       y=expression(pi),
       title="Fish density vs Capture efficiency at site 10",
       subtitle = "Hierarchical model") 
ggExtra::ggMarginal(phier, type = "densigram", bins = 40)
#> Warning: Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins
#> Ignoring unknown parameters: bins

Combining successive removals and a rapid sampling technique

Removal sampling by electric fishing is time and manpower consuming, because it is difficult to store the removed fish and to go thrice in the field for a large number of sites. To increase the number of sites sampled with limited budget and diminishing human resources, recourse can be made to rapid electric fishing assessment techniques. A 5-mn sampling technique requires a single-timed sample for each site under a lighter protocol but only provides relative abundance measures. To be used for absolute population abundance estimation, abundance indices (\(AI\)) need ¨to be first calibrated with another method of population size estimation.

The \(AI\)’s performed on the \(33\) site of the dataset:

knitr::kable(tibble(Site = 1:SucRemNivelle$I, AI = SucRemNivelle$AI))

Site	AI
1	6
2	17
3	12
4	17
5	11
6	22
7	16
8	25
9	25
10	15
11	11
12	4
13	10
14	6
15	9
16	24
17	13
18	16
19	14
20	10
21	25
22	4
23	NA
24	24
25	NA
26	63
27	NA
28	NA
29	21
30	41
31	44
32	NA
33	5

To perform such a calibration in a consistent Bayesian framework, we need to model jointly successive removals and \(AI\)’s. There is a strong link between the observed abundance indices and the mean of the posterior pdf of \(\delta\) that we take as an estimate of the fish density from each site.

dfl_hier <- extract_longer(hier_posterior_sample) %>% mutate(Model = "Hierarchical")
df_delta_AI  <- dfl_hier %>% 
  filter(str_detect(parameter, "^delta\\.")) %>%
  rename("delta" = value) %>% 
  mutate(Site = as_factor(str_sub(parameter, 7))) %>% 
  group_by(Site) %>% 
  summarise(mean_delta = mean(delta)) %>% 
  mutate(abundance = SucRemNivelle$AI)
df_delta_AI %>% 
  ggplot(aes(x = abundance, y = mean_delta)) +
  ylim(c(0, .4)) + xlim(c(0, 70)) +
  geom_point() +
  xlab("Abundance indices") +
  ylab("Estimated fish density") +
  theme_classic(base_size = 15L) +
  labs(caption = "Posterior means of Salmon juvenile density (under hierarchical bayesian model) versus observed Abundance Indices.")
#> Warning: Removed 5 rows containing missing values (geom_point).

We can figure out a linear trend but with an increasing dispersion. Conversely to the homoscedastic behavior of the linear regression model, the variance should depend on the mean. Keeping these clues in mind, we relate the vector of Salmon juveniles’ density \(\delta_{i}\)’s to the vector of abundance indices \(AI_{i}\)’s as follows:

As we are counting events, we hypothesize for each site \(i\), a Poisson distribution for AI around a mean value \(\mu_{AI}:\)

\[ AI_{i}~\sim~Poisson(\mu_{AI_{i}}) \]

In turn, this true mean value \(\mu_{AI}\) will be drawn from a Gamma distribution.

\[ \mu_{AI_{i}} ~ \sim~ Gamma(\alpha_{AI_{i}},\beta_{AI_{i}}) \]

Such a choice can be hypothesized for two reasons:

The convolution of a Poisson by a Gamma distribution makes a negative Binomial pdf, a common model for overdispersion in a counting experiment;
By setting \(\beta_{AI_{i}}=f,\ \alpha_{AI{i}}=k\times f\times\delta_{i}\), one can describe a variance increasing (linearly) with the mean since:

\[ \left\{ \begin{aligned} & \mathbb{E}(\mu_{AI_{i}}) = \frac{\alpha_{AI_{i}}}{\beta_{AI_{i}}}% = k\delta_{i}\\ & \mathbb{V}(\mu_{AI_{i}}) = \frac{\alpha_{AI_{i}}}{\beta_{AI_{i}}^{2}}% = \frac{ \mathbb{E} (\mu_{AI_{i}}) }{f} \end{aligned} \right. \]

\end{enumerate}

The additional parameters \(k\) and \(f\) must be assigned a prior pdf. Parameter \(k\) is given a flat noninformative gamma prior: \[ k~\sim~Gamma(1,10^{-3}) \] It is convenient to work with \(\phi\) such that \(F=\frac{\phi}{1-\phi}\). \(\phi\) is the probability parameter from the Negative Binomial and we naturally opt for a uniform pdf: \[ \phi ~ \sim~Beta(1,1) \]

Directed Acyclic Graph of the hierarchical model for successive removals including abundance indices as an additional source of information for the 0+ juveniles density.

ai_model_str <- "model
{

# Priors
# ------

# Hyperprior for the trapping efficiency
mu_p ~ dbeta(1,1) 
log_u_p ~ dunif(0,10)
u_p <- exp(log_u_p)

a <- mu_p * u_p
b <- (1-mu_p) * u_p

# Compute standard deviation of p
var_p <- mu_p*(1-mu_p)*(1/(u_p+1))
sd_p <- sqrt(var_p)

# Hyperpriror for the density
mu_d ~ dgamma(1,1)
sd_d ~ dgamma(1,0.001)

# Parameterization of the Gamma distribution
c <- pow(mu_d,2) / pow(sd_d,2)
d <- mu_d / pow(sd_d,2)

for (i in 1:I)
{

# Prior on capture efficiency
    pi[i] ~ dbeta(a,b)

# Prior on fish density
    delta[i] ~ dgamma(c,d)

# Number of fish (S[i] in the data set)
    mu_nu[i] <- S[i] * delta[i]
    nu[i] ~ dpois(mu_nu[i]) T(,500)

}

# Hyperpriors for the mean abundance index
k ~ dgamma(1, 0.001)
phi ~ dbeta(1, 1)
f <- phi/(1-phi)

# Priors for the Abundance index
for (i in 1:I) {
  alpha_AI[i] <- k*f*delta[i]
  mu_AI[i] ~ dgamma(alpha_AI[i], f)

}


# Likelihood
# ----------

for (i in 1:I) {
    C1[i] ~ dbin(pi[i], nu[i])
    nur1[i] <- nu[i] - C1[i]
    C2[i] ~ dbin(pi[i], nur1[i])
    nur2[i] <- nur1[i] - C2[i]
    C3[i] ~ dbin(pi[i], nur2[i])
  AI[i] ~ dpois(mu_AI[i])
  
}

# Predictive distribution of the density conditionnally upon
# observed abundance indices

for (i in 1:I_pred) {
  AI_pred[i] ~ dpois(mu_AI_pred[i])
  mu_AI_pred[i] ~ dgamma(alpha_AI_pred[i], f)
  alpha_AI_pred[i] <- k * delta_pred[i] * f
  delta_pred[i] ~ dgamma(c,d)
  # Link with observed abundance indices
}

}
"

set.seed(123)
ai_model <- jags.model(file = textConnection(ai_model_str), 
                    data =SucRemNivelle,#c(dataList$value, data_AI),
                    n.chains = 3)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 167
#>    Unobserved stochastic nodes: 298
#>    Total graph size: 717
#> 
#> Initializing model

# Inferences
update(ai_model, n.iter = 10000)
ai_posterior_sample <- coda.samples(
  ai_model,
  variable.names = c("a", "b", "c", "d", "mu_d", "mu_p", "sd_d", "sd_p",
                     "pi", "delta", "nu", "phi", "k", "f", "mu_AI", 
                     "delta_pred", "mu_AI_pred"),
  n.iter = 10000,
  thin = 10)

summarise_draws(as_draws_df(ai_posterior_sample),
                default_convergence_measures()) %>%
  filter(variable %in% c("a", "b", "c", "d", 
                         "mu_d", "mu_p", "sigma_d", "sigma_p",  
                         "phi", "k", "mu_AI")) %>%  
  knitr::kable()

variable	rhat	ess_bulk	ess_tail
a	0.9998840	2029.4588	2781.9576
b	1.0002102	2294.4066	2829.9883
c	1.0042604	1116.3178	2283.0722
d	1.0094108	723.2803	1751.8044
k	1.0139487	448.7799	790.3346
mu_d	1.0113785	666.4471	1436.7747
mu_p	0.9993467	2560.4025	2899.8039
phi	1.0154339	335.2859	412.8917

mcmc_trace(ai_posterior_sample, 
           pars = c("delta[5]", "pi[5]",
                    "delta[10]", "pi[10]"))

Results and analysis

dfw_ai <- extract_wider(ai_posterior_sample)
dfw_ai %>% 
  select("mu_d", "sd_d", "mu_p",  "sd_p", "phi", "k") %>% 
  GGally::ggpairs(
  columnLabels = c("mu[delta]", "sigma[delta]", "mu[pi]",  "sigma[pi]", "phi", "k"),
  labeller = "label_parsed",
  lower = list(continuous = wrap("points", alpha = 0.3,size = .5))
  ) + 
  theme_linedraw(base_size = 15L, base_line_size = 0) +
  theme(axis.ticks = element_line(size = 15/22)) + 
  labs(caption = "Posterior distributions of parameters for the full AI + successive removal model.")

The statistical Bayesian learning machinery provides informative posterior distribution for \(\phi\) and \(k\):

\(\phi\) is far from uniform, its posterior pdf varies around \(0.65\)
\(k\) varies close to \(150\) with a small standard deviation

dfl_ind <- extract_longer(ind_posterior_sample) %>%  mutate(Model = "Independent")
dfl_ai <- extract_longer(ai_posterior_sample) %>%  mutate(Model = "Abundance Index")

bind_rows(dfl_hier, dfl_ind, dfl_ai) %>% 
  filter(str_detect(parameter, "^delta\\.")) %>%
  rename("delta" = value) %>% 
  mutate(Site = as_factor(str_sub(parameter, 7))) %>%  # 7 = length("delta.") + 1
  ggplot() +
  aes(x = Site, y = delta, fill = Model ) +
  geom_boxplot(outlier.shape = NA, outlier.size = .25) +
  scale_x_discrete(guide = guide_axis(angle = 90)) +
  ylim(c(0,0.75)) +
  scale_fill_manual(values = c("gray10", "gray50", "gray90")) +
  ylab("Fish density") +
  theme_classic(base_size = 15L) +
  labs(caption = expression("Marginal posterior distributions of salmon densities delta for each site index."))
#> Warning: Removed 2188 rows containing non-finite values (stat_boxplot).

The figure makes clear the shrinkage effect due to hierarchical modeling: with the hierarchical model, the densities are less dispersed around the grand mean. Sharing information between sites allows for more precise estimations of the local fish densities. The black boxplotsdepict the posterior pdf of fish density \(\delta\) when modeling jointly the successive removals and the \(AI\)’s. When compared to the previous model without taking into account the information conveyed by the abundance index (grey boxplots), we see that there is only a small shrinkage effect: although the 50% credible intervals are generally narrower, i.e., we learn something from the \(AI\) sample but not that markedly. Site \(26\) bears a noticeable exception that can be explained by contradictory information brought by successive removals and by AI. If we were to trust the successive removal model only, the juvenile density for site \(26\) should be around \(0.3\) and, as \(k\) is close to \(140\), we should expect a value of \(150\times0.3=45\) individuals for the abundance index, but we actually observed \(63\) juveniles in the \(AI\) data!

dfl_ai %>% 
  filter(str_detect(parameter, "^delta_pred.")) %>%
  mutate(AI = as.numeric(str_sub(parameter, 12))) %>% 
  rename("delta_pred" = value) %>% 
  group_by(AI) %>% 
  summarise(qs = quantile(delta_pred, 
                          c(.1, .25, .5, .75, .9)),
                          prob = c(.025, .25, .5, .75, .975)) %>% 
  pivot_wider(names_from = prob, values_from =  qs) %>% 
  ggplot(aes(x = AI, y = `0.5`)) +
  geom_ribbon(aes(ymin = `0.025`, ymax = `0.975`), fill = "gray50") +
  geom_ribbon(aes(ymin = `0.25`, ymax = `0.75`), fill = "gray75") +
  geom_line() +
  geom_point(data = df_delta_AI,
  mapping = aes(x = abundance, y = mean_delta)) +
  xlab("Abundance indices") +
  ylab("Posterior predictive of fish density") +
  theme_classic(base_size = 15L) +
  labs(caption = "Predictive posterior distribution of Salmon juvenile density versus Abundance Indices (50% and 95% credible interval)")
#> `summarise()` has grouped output by 'AI'. You can override using the `.groups`
#> argument.
#> Warning: Removed 5 rows containing missing values (geom_point).

The figure shows that the index of abundance alone allows for differentiation between contrasting levels of fish density without recourse to the successive removal technique in the future. Yet fairly imprecise, this rapid sampling technique also provides the corresponding confidence bounds that match the cloud made by the posterior means of Salmon juvenile density (under the HBM model) and the Abundance Indices observed in the dataset.

Parent, E., and E. Prévost. 2013. Introduction to Hierarchical Bayesian Modeling for Ecological Data. CRC press Boca Raton, Florida.

Site	AI
1	6
2	17
3	12
4	17
5	11
6	22
7	16
8	25
9	25
10	15
11	11
12	4
13	10
14	6
15	9
16	24
17	13
18	16
19	14
20	10
21	25
22	4
23	NA
24	24
25	NA
26	63
27	NA
28	NA
29	21
30	41
31	44
32	NA
33	5

Site	AI
1	6
2	17
3	12
4	17
5	11
6	22
7	16
8	25
9	25
10	15
11	11
12	4
13	10
14	6
15	9
16	24
17	13
18	16
19	14
20	10
21	25
22	4
23	NA
24	24
25	NA
26	63
27	NA
28	NA
29	21
30	41
31	44
32	NA
33	5

Jags code for chapter 10.2