Run SEFRA model
Charles Edwards and Tom Peatman
06 Apr 2025
Source:vignettes/run_model.Rmd
run_model.RmdLoad packages required for data preparation:
Prelims
Source data from sefraInputs:
sefra_data("inputsBio")## Loaded data:
##
##
## |name |description |created |version | id|
## |:---------|:-----------|:-------------------|:----------------------|--:|
## |inputsBio |reference |2025-03-27 12:25:55 |20250327T112555Z-ed41c | 2|
sefra_data("cryptic_capture_longline")## Loaded data:
##
##
## |name |description |created |version | id|
## |:------------------------|:-----------|:-------------------|:----------------------|--:|
## |cryptic_capture_longline |reference |2025-03-27 12:25:55 |20250327T112555Z-240cc | 1|
Extract biological data for reference case:
# Import demographic data
N_BP <- inputsBio[["N_BP"]]
S_opt <- inputsBio[["S_opt"]]
A_curr <- inputsBio[["A_curr"]]
P_B <- inputsBio[["P_B"]]
P_nest <- inputsBio[["p_nest"]]
P_southern <- inputsBio[["p_southern"]]The sefraData object
Data are stored in a S4 object of class
sefraData. The sefraData object is a list with
pre-defined components. When assigning data to this list, checks are
made during the assignment to ensure the data are in the correct format
for model input. The sefraData object is initialised using
a call to sefraData(<species>, <identifier>). A
vector of species must be supplied (see ?species for a list
of those available). By defining the species, all subsequent assignments
can be checked for consistency with this species list. An
identifier argument is also allowed, so that different data
configurations can be labelled. In this vignette, we demonstrate how an
sefraData object can be populated and input to a model
run.
First, initialise the data object using three example species:
## species codes input: including all possible capture codes
## constructed 'sefraData' object
When choosing the species we are assuming that all captures are from
one of these species, even if the capture code recorded in the capture
data refers to a lower taxonomic resolution. In the example below,
captures are also recorded using the generic BLZ code. The
model will assume that genus-, family- or generic-level captures are of
DIW, DQS or TWD; i.e., when
fitting to the data the captures will be predicted from the fishery
overlap with these species only. If generic captures could be of other
species not recorded at the species level, then these species should be
included in the sefraData object. In which case the model
will select a zero probability of observation for that species.
Following initialisation of the sefraData object, the
species names are available using the accessor function:
species_names(sefra_dat)## Species
## code common_name scientific_name
## 1 DIW Gibson's albatross Diomedea antipodensis gibsoni
## 2 DQS Antipodean albatross Diomedea antipodensis antipodensis
## 13 TWD New Zealand white-capped albatross Thalassarche cauta steadi
## genus family code_resolution id_species
## 1 Diomedea Diomedeidae species 1
## 2 Diomedea Diomedeidae species 2
## 13 Thalassarche Diomedeidae species 13
which can be useful for labeling plots of model diagnostics. Note
that the id_species corresponds to the order recorded in
?species. This numbering is retained throughout, allowing
the data inputs and model outputs to be matched consistently to the
species.
The relevant capture codes are automatically generated from the list of species, and will include the species-level capture codes, but also lower level taxonomic codes. In the current example:
capture_codes(sefra_dat)## 10 capture codes:
## (empty captures data frame)
## code id_code resolution id_resolution
## 1 DIW 1 species 1
## 2 DQS 2 species 1
## 3 TWD 13 species 1
## 4 DGA 26 complex 2
## 5 DST 29 complex 2
## 6 DWC 32 complex 2
## 7 DIZ 34 genus 3
## 8 THZ 35 genus 3
## 9 ALZ 38 family 4
## 10 BLZ 40 class 5
Assign biological data
From the inputsBio data accessed using
sefraInputs::sefra_data(), we assign these to the
sefraData object. Assignment functions are provided for
each data type, and automatically select the required species:
n_breeding_pairs(sefra_dat) <- N_BP
adult_survival(sefra_dat) <- S_opt
p_breeding(sefra_dat) <- P_B
age_breeding(sefra_dat) <- A_curr
p_nest(sefra_dat) <- P_nest
p_southern(sefra_dat) <- P_southernBiological inputs for the number of breeding pairs, the adult
survival, the probability of breeding, and the age at first breeding
should be provided as two-parameter probability density functions. These
can be one of: uniform, beta,
normal, log-normal or
logit-normal (see ?distributions). The input
data frame must contain the column headers distribution,
par1 and par2.
We can view the input data using, for example:
## id_species code distribution par1 par2
## 1 1 DIW log-normal 4425.00 0.050
## 2 2 DQS log-normal 3383.00 0.050
## 3 3 DIX log-normal 10130.00 0.050
## 4 4 DBN weibull 9.25 1710.000
## 5 5 DAM log-normal 60.00 0.100
## 6 6 DIP log-normal 5814.00 0.070
## 7 7 DIQ log-normal 4261.00 0.110
## 8 8 DCR log-normal 26800.00 0.100
## 9 9 TQH log-normal 33988.00 0.100
## 10 10 DIM log-normal 670960.00 0.050
## 11 11 TQW log-normal 14129.00 0.050
## 12 12 DCU log-normal 15335.00 0.100
## 13 13 TWD log-normal 85820.00 0.120
## 14 14 DKS log-normal 35242.00 0.050
## 15 15 DER log-normal 5294.00 0.010
## 16 16 DIC log-normal 63055.00 0.050
## 17 17 DSB log-normal 13493.00 0.050
## 18 18 DNB log-normal 19354.00 0.050
## 19 19 PHU weibull 23.20 13660.000
## 20 20 PHE log-normal 20927.00 0.100
## 21 21 PCI log-normal 105617.00 0.150
## 22 22 PRK log-normal 5456.00 0.057
## 23 23 PCW log-normal 6223.00 0.061
## 24 24 PRO log-normal 1317300.00 0.100
## 25 25 PCN log-normal 42000.00 0.096
The assigned model values are:
n_breeding_pairs(sefra_dat)## distribution par1 par2
## DIW log-normal 4425 0.05
## DQS log-normal 3383 0.05
## TWD log-normal 85820 0.12
Similarly:
## id_species code distribution par1 par2
## 1 1 DIW beta 0.595 170.00
## 2 2 DQS beta 0.450 91.30
## 3 3 DIX logit-normal 0.494 0.05
## 4 4 DBN beta 0.349 51.30
## 5 5 DAM logit-normal 0.600 0.05
## 6 6 DIP beta 0.531 22.20
## 7 7 DIQ beta 0.531 22.20
## 8 8 DCR beta 0.596 4100.00
## 9 9 TQH logit-normal 0.596 0.05
## 10 10 DIM beta 0.844 174.00
## 11 11 TQW logit-normal 0.900 0.05
## 12 12 DCU logit-normal 0.747 0.05
## 13 13 TWD beta 0.680 63.90
## 14 14 DKS beta 0.821 29.70
## 15 15 DER logit-normal 0.773 0.05
## 16 16 DIC beta 0.406 17.50
## 17 17 DSB beta 0.804 34.90
## 18 18 DNB logit-normal 0.800 0.05
## 19 19 PHU logit-normal 0.730 0.05
## 20 20 PHE beta 0.730 15.80
## 21 21 PCI logit-normal 0.900 0.05
## 22 22 PRK beta 0.610 143.00
## 23 23 PCW beta 0.480 45.40
## 24 24 PRO logit-normal 0.750 0.05
## 25 25 PCN logit-normal 0.797 0.05
p_breeding(sefra_dat)## distribution par1 par2
## DIW beta 0.595 170.0
## DQS beta 0.450 91.3
## TWD beta 0.680 63.9
If you wish to use alternate biological data values, please provide
these directly to the project team for inclusion in the
sefraInputs package. This will ensure they are available to
all participants. Centralisation of the data inputs further allows for
consistent cross referencing of different data supplied to different
model runs.
Fisheries and structural data
Fisheries input data are stored in the sefraData object
in three data frames:
- observed captures:
- observed overlap
- commercial (total) overlap
All the supplied data frames should contain the following headers:
-
monthas a character vector and/orid_monthas an integer value between1and12(see?months); -
yeara an integer value; -
fishery_groupas a character vector and/orid_fishery_groupas integer values.
In addition, the captures data frame must include the following field headers:
-
codeas a character vector and/orid_codeas integer values (see?codesfor permitted capture codes); -
n_captures, -
n_captures_aliveand, -
n_captures_deadas integer values.
The overlap data frames should include:
-
speciesas a character vector and/orid_speciesas integer values (see?speciesfor permitted capture codes); -
species_groupas a character vector and/orid_species_groupas integer values; -
celland/orid_cellas an integer value between1and1224, this being the number of 5x5 degree cells in the southern hemisphere. For assignment of data to cells, please see the documentation for thesefraInputspackage.
All months are referenced using an integer value between
1 and 12. This is true regardless of whether a
character vector is supplied. Similarly, all species are referenced
using an integer value between 1 and 25, and this
id_species value is retained throughout the assessment. The
capture code is referenced using an integer value for
id_code between 1 and 40, which is similarly
retained throughout the assessment and will be consistent across
different model runs or data constructs. If you wish to included
additional species or capture codes please contact the project team.
There are no fixed reference values for the fishery group or species
group, since these are dependent on the analysis and must be provided by
the investigator (see the vignette for sefraInputs).
Assignment functions are provided for the species group and fishery
group. These represent structural assumptions and must be supplied to
the sefraData object before the overlap and capture data
frames. This two-step process facilitates checking of the data.
To illustrate, we first create some artificial data, as three
separate data frames: captures_o, overlap_o
and overlap_t.
| n_captures | cell | year | code | month | flag | n_captures_alive | n_captures_dead | id_month | id_fishery_group |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 700 | 2008 | DQS | Sep | F2 | 0 | 0 | 9 | 2 |
| 0 | 327 | 2005 | DIZ | Aug | F2 | 0 | 0 | 8 | 2 |
| 0 | 704 | 2017 | DQS | Sep | F1 | 0 | 0 | 9 | 1 |
| 0 | 909 | 2006 | TWD | Sep | F2 | 0 | 0 | 9 | 2 |
| 0 | 186 | 2010 | DIW | Jun | F1 | 0 | 0 | 6 | 1 |
| 0 | 263 | 2015 | TWD | Sep | F1 | 0 | 0 | 9 | 1 |
| id_species | id_month | id_fishery_group | id_species_group | year | cell | overlap |
|---|---|---|---|---|---|---|
| 2 | 9 | 2 | 2 | 2008 | 700 | 0.0006489 |
| 2 | 8 | 2 | 2 | 2005 | 327 | 0.0003481 |
| 2 | 9 | 1 | 2 | 2017 | 704 | 0.0004360 |
| 13 | 9 | 2 | 3 | 2006 | 909 | 0.0009274 |
| 1 | 6 | 1 | 1 | 2010 | 186 | 0.0007493 |
| 13 | 9 | 1 | 3 | 2015 | 263 | 0.0005270 |
| id_species | id_month | id_fishery_group | id_species_group | year | cell | overlap |
|---|---|---|---|---|---|---|
| 2 | 9 | 2 | 2 | 2008 | 700 | 0.0009973 |
| 2 | 8 | 2 | 2 | 2005 | 327 | 0.0006618 |
| 2 | 9 | 1 | 2 | 2017 | 704 | 0.0005981 |
| 13 | 9 | 2 | 3 | 2006 | 909 | 0.0008382 |
| 1 | 6 | 1 | 1 | 2010 | 186 | 0.0000331 |
| 13 | 9 | 1 | 3 | 2015 | 263 | 0.0005699 |
Functions are provided to assign structural assumptions based on the input data frames. For example, we can assign fishery groups using:
fishery_groups(sefra_dat) <- captures_o
fishery_groups(sefra_dat)## 2 fishery groups
## fishery_group id_fishery_group
## 1 fishery_1 1
## 2 fishery_2 2
This does not assign any actual data. It only extracts structural
information from the data captures_o data frame. Structural
assumptions can also be provided as character vectors. In this case,
care must be taken to ensure the values match the values in the
corresponding input data frames. To specify species groups for example,
we could use:
species_groups(sefra_dat) <- c('group_1', 'group_2', 'group_3')The same results could be achieved by assigning a data frame directly, which is the recommended approach. In this case:
species_groups(sefra_dat) <- overlap_o
species_groups(sefra_dat)## 3 species groups
## species_group id_species_group
## 1 group_1 1
## 2 group_2 2
## 3 group_3 3
When examining the species group assignments to each species, we can use:
species_groups(sefra_dat, print_species = TRUE)## 3 species groups
## (with 3 species)
## species species_group id_species id_species_group
## 1 DIW group_1 1 1
## 2 DQS group_2 2 2
## 3 TWD group_3 13 3
The data frames themselves can then be assigned. In the first instance, overlap and captures data are provided:
## Prepared observer captures and overlap data
In this case, two separate data frames are supplied to the
data_prep() function. Multiple, data frame can be provided,
as long as the referencing is consistent. It is the presence of captures
(i.e., n_captures) in at least one of the data frames that
identifies the data as observed. Otherwise it is treated as commercial
overlap data.
Following assignment of capture data, the captures per code can be viewed using:
capture_codes(sefra_dat)## 10 capture codes:
## Joining with `by = join_by(id_code)`
## code id_code resolution id_resolution captures
## 1 DIW 1 species 1 26
## 2 DQS 2 species 1 23
## 3 TWD 13 species 1 21
## 4 DGA 26 complex 2 0
## 5 DST 29 complex 2 0
## 6 DWC 32 complex 2 0
## 7 DIZ 34 genus 3 19
## 8 THZ 35 genus 3 5
## 9 ALZ 38 family 4 0
## 10 BLZ 40 class 5 25
or by taxonomic resolution using:
capture_resolutions(sefra_dat)## 10 capture code resolutions:
## Joining with `by = join_by(id_fishery_group, id_code)`
## id_fishery_group id_resolution resolution captures
## 1 1 1 complex 0
## 2 1 1 species 10
## 3 1 2 genus 7
## 4 1 2 species 13
## 5 1 3 genus 4
## 6 1 3 species 7
## 7 1 4 complex 0
## 8 1 4 family 0
## 9 1 5 class 14
## 10 1 5 complex 0
## 11 2 1 complex 0
## 12 2 1 species 16
## 13 2 2 genus 12
## 14 2 2 species 10
## 15 2 3 genus 1
## 16 2 3 species 14
## 17 2 4 complex 0
## 18 2 4 family 0
## 19 2 5 class 11
## 20 2 5 complex 0
In the latter case, captures are disaggregated by fishery group.
To assign the commercial overlap data use:
## Prepared commercial (total) overlap data
To check the data have loaded correctly we can view the object:
sefra_dat## 'sefraData' class object:
## Species
## code common_name scientific_name
## 1 DIW Gibson's albatross Diomedea antipodensis gibsoni
## 2 DQS Antipodean albatross Diomedea antipodensis antipodensis
## 13 TWD New Zealand white-capped albatross Thalassarche cauta steadi
## genus family code_resolution id_species
## 1 Diomedea Diomedeidae species 1
## 2 Diomedea Diomedeidae species 2
## 13 Thalassarche Diomedeidae species 13
## 3 species groups
## (with 3 species)
## species species_group id_species id_species_group
## 1 DIW group_1 1 1
## 2 DQS group_2 2 2
## 3 TWD group_3 13 3
## 2 fishery groups
## fishery_group id_fishery_group
## 1 fishery_1 1
## 2 fishery_2 2
## 10 capture codes:
## Joining with `by = join_by(id_code)`
## code id_code resolution id_resolution captures
## 1 DIW 1 species 1 26
## 2 DQS 2 species 1 23
## 3 TWD 13 species 1 21
## 4 DGA 26 complex 2 0
## 5 DST 29 complex 2 0
## 6 DWC 32 complex 2 0
## 7 DIZ 34 genus 3 19
## 8 THZ 35 genus 3 5
## 9 ALZ 38 family 4 0
## 10 BLZ 40 class 5 25
## Captures data frame:
## # A tibble: 82 × 6
## captures_k captures_live_k captures_dead_k code_k month_k fishery_group_k
## <int> <int> <int> <int> <int> <int>
## 1 3 0 3 1 1 1
## 2 1 0 1 1 4 1
## 3 1 0 1 1 5 1
## 4 1 0 1 1 6 1
## 5 1 1 0 1 8 1
## 6 2 1 1 1 10 1
## 7 1 1 0 1 11 1
## 8 2 2 0 1 1 2
## 9 3 0 3 1 2 2
## 10 1 1 0 1 4 2
## # ℹ 72 more rows
## Observed overlap data frame:
## # A tibble: 72 × 5
## overlap_i species_i species_group_i month_i fishery_group_i
## <dbl> <int> <int> <int> <int>
## 1 0.00413 1 1 1 1
## 2 0.00596 1 1 2 1
## 3 0.00827 1 1 3 1
## 4 0.00963 1 1 4 1
## 5 0.0108 1 1 5 1
## 6 0.00691 1 1 6 1
## 7 0.00653 1 1 7 1
## 8 0.00827 1 1 8 1
## 9 0.0104 1 1 9 1
## 10 0.00748 1 1 10 1
## # ℹ 62 more rows
## Commercial overlap data frame:
## # A tibble: 995 × 5
## overlap_j species_j species_group_j month_j fishery_group_j
## <dbl> <int> <int> <int> <int>
## 1 0.000975 1 1 1 1
## 2 0.000378 1 1 1 1
## 3 0.000891 1 1 1 1
## 4 0.000905 1 1 1 1
## 5 0.000111 1 1 1 1
## 6 0.000792 1 1 1 1
## 7 0.000567 1 1 1 1
## 8 0.000712 1 1 1 1
## 9 0.00000396 1 1 1 1
## 10 0.000947 1 1 1 1
## # ℹ 985 more rows
Captures and overlap data can also be extracted using, respectively:
captures(sefra_dat)## # A tibble: 82 × 6
## captures_k captures_live_k captures_dead_k code_k month_k fishery_group_k
## <int> <int> <int> <int> <int> <int>
## 1 3 0 3 1 1 1
## 2 1 0 1 1 4 1
## 3 1 0 1 1 5 1
## 4 1 0 1 1 6 1
## 5 1 1 0 1 8 1
## 6 2 1 1 1 10 1
## 7 1 1 0 1 11 1
## 8 2 2 0 1 1 2
## 9 3 0 3 1 2 2
## 10 1 1 0 1 4 2
## # ℹ 72 more rows
overlap(sefra_dat)## $observer
## # A tibble: 72 × 5
## overlap_i species_i species_group_i month_i fishery_group_i
## <dbl> <int> <int> <int> <int>
## 1 0.00413 1 1 1 1
## 2 0.00596 1 1 2 1
## 3 0.00827 1 1 3 1
## 4 0.00963 1 1 4 1
## 5 0.0108 1 1 5 1
## 6 0.00691 1 1 6 1
## 7 0.00653 1 1 7 1
## 8 0.00827 1 1 8 1
## 9 0.0104 1 1 9 1
## 10 0.00748 1 1 10 1
## # ℹ 62 more rows
##
## $fishery
## # A tibble: 995 × 5
## overlap_j species_j species_group_j month_j fishery_group_j
## <dbl> <int> <int> <int> <int>
## 1 0.000975 1 1 1 1
## 2 0.000378 1 1 1 1
## 3 0.000891 1 1 1 1
## 4 0.000905 1 1 1 1
## 5 0.000111 1 1 1 1
## 6 0.000792 1 1 1 1
## 7 0.000567 1 1 1 1
## 8 0.000712 1 1 1 1
## 9 0.00000396 1 1 1 1
## 10 0.000947 1 1 1 1
## # ℹ 985 more rows
Cryptic capture
Cryptic capture distribution parameters are defined by species group and fishery group, and can be assigned as follows:
cryptic_capture(sefra_dat) <- cryptic_capture_longline## 2 fishery groups
## 3 species groups
cryptic_capture(sefra_dat)## distribution fishery_group species_group par value
## 1 log-normal fishery_1 group_1 par1 1.420
## 2 log-normal fishery_2 group_1 par1 1.420
## 3 log-normal fishery_1 group_2 par1 1.420
## 4 log-normal fishery_2 group_2 par1 1.420
## 5 log-normal fishery_1 group_3 par1 1.420
## 6 log-normal fishery_2 group_3 par1 1.420
## 7 log-normal fishery_1 group_1 par2 0.186
## 8 log-normal fishery_2 group_1 par2 0.186
## 9 log-normal fishery_1 group_2 par2 0.186
## 10 log-normal fishery_2 group_2 par2 0.186
## 11 log-normal fishery_1 group_3 par2 0.186
## 12 log-normal fishery_2 group_3 par2 0.186
At present, only a log-normal distribution is allowed.
Values stored in the the cryptic_capture_longline object
represent the author’s best understanding of cryptic capture in longline
fisheries, but can be adjusted as required by the user. We have
stratified the cryptic captures assumption by species group, because we
are currently not aware of any information that would allow a higher
resolution.
Example model run
First, source and compile the model code:
mdl <- sefra(sefra_dat)## SEFRA-seabird model v2.4.0
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install the appropriate version of Rtools for 4.4.3 from
## https://cran.r-project.org/bin/windows/Rtools/.
## Trying to compile a simple C file
## Running "C:/R/bin/x64/Rcmd.exe" SHLIB foo.c
## using C compiler: 'gcc.exe (GCC) 13.2.0'
## make[1]: Entering directory '/c/Users/ctted/AppData/Local/Temp/RtmpGsJ18v'
## gcc -I"C:/R/include" -DNDEBUG -I"C:/Rlibrary/Rcpp/include/" -I"C:/Rlibrary/RcppEigen/include/" -I"C:/Rlibrary/RcppEigen/include/unsupported" -I"C:/Rlibrary/BH/include" -I"C:/Rlibrary/StanHeaders/include/src/" -I"C:/Rlibrary/StanHeaders/include/" -I"C:/Rlibrary/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Rlibrary/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include "C:/Rlibrary/StanHeaders/include/stan/math/prim/fun/Eigen.hpp" -std=c++1y -I"C:/rtools44/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c foo.c -o foo.o
## cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
## In file included from C:/Rlibrary/RcppEigen/include/Eigen/Core:19,
## from C:/Rlibrary/RcppEigen/include/Eigen/Dense:1,
## from C:/Rlibrary/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
## from <command-line>:
## C:/Rlibrary/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
## 679 | #include <cmath>
## | ^~~~~~~
## compilation terminated.
## make[1]: *** [C:/R/etc/x64/Makeconf:289: foo.o] Error 1
## make[1]: Leaving directory '/c/Users/ctted/AppData/Local/Temp/RtmpGsJ18v'
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install the appropriate version of Rtools for 4.4.3 from
## https://cran.r-project.org/bin/windows/Rtools/.
The sefra() function can optionally be supplied with the
sefraData object as an argument. If commercial overlap is
missing from the data, then predicted captures, deaths and risk are not
calculated by the model. Only catchabilities and fits to the observed
captures will be generated. This will lower the memory requirements for
the model and may be useful for initial model-based explorations of the
data.
Note that a custom model can also be provided to
sefra(). In this instance we recommend first calling
sefra(write_to = <file>) to write the
stan code as a text file. This base code can then be
modified by the user and re-compiled using
sefra(read_from = <file>).
The function initial_values() can be used to create a
list of initial values. It can also be called with the model as an
argument, in which case a maximum posterior density estimate is
returned, and initial fits are plotted. To check initial values:
mdl_ini <- initial_values(sefra_dat, mdl, iter = 200)
This can be useful for quickly exploring the structural assumptions of the model, and whether reasonable fits to the data are obtained. Typically, structural mis-specification of the species and fishery groups, or a deficiency in the data, will lead to stronger prior updates on the biological parameters.
When fitting the model, captures are summed per fishery group and capture code. Following summation, capture codes represent all captures at that taxonomic resolution or higher. For example, using the current data, the captures per capture code and fishery group are represented within the model as a two-dimensional matrix, with the fishery groups along the first dimension:
## `summarise()` has grouped output by 'id_fishery_group'. You can override using
## the `.groups` argument.
## Joining with `by = join_by(id_fishery_group, code)`
## DIW DQS TWD DGA DST DWC DIZ THZ ALZ BLZ
## fishery_1 10 13 7 0 0 0 7 4 0 14
## fishery_2 16 10 14 0 0 0 12 1 0 11
The cumulative sum (or “inclusive sum”) of the capture codes used for fitting the model are:
## DIW DQS TWD DGA DST DWC DIZ THZ ALZ BLZ
## fishery_1 10 13 7 23 7 23 30 11 41 55
## fishery_2 16 10 14 26 14 26 38 15 53 64
For the BLZ capture code, there are 25 captures recorded
as BLZ in the empirical data, but a total of 119
BLZ captures when summed across captures recorded as
BLZ or a higher taxonomic resolution. We refer to 25 as the
empirical captures and 119 as the inclusive
captures. Similarly for DIZ we have 19
empirical captures and 68 inclusive captures.
This definition of inclusive captures is relevant to the
model fit and downstream diagnostics.
Model run
Run the model using a standard call from the rstan
package:
mdl_fit <- rstan::sampling(mdl, data = sefra_dat, verbose = TRUE, init = function() mdl_ini, chains = 2, iter = 200, control = list(adapt_delta = 0.95, stepsize = 0.1, max_treedepth = 100))## Warning: The largest R-hat is NA, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
The different scales on which biological parameters are estimated
makes it difficult for the MCMC chain to explore the full parameter
space, and we therefore recommend the above settings for the
control argument, which will assist convergence. We further
recommend that uniform priors should only be used for
p_breeding and adult_survival (if at all) and
not for n_breeding_pairs or age_breeding.
Functions exist to extract and summarise relevant model outputs, to allow the user to create their own diagnostic plots and tables:
out <- sefraOutputs(sefra_dat, mdl_fit)## constructed 'sefraOutputs' object
out_summary <- summary(out)These are not explored further here.
Model diagnostics
Trace plots can be used to visualise convergence. For example, we can plot trace diagnostics for the biological parameters using:
trace_plot(mdl_fit, pars = c("p_breeding"), labels = paste0("P[", sefra_dat$species, "]"))
trace_plot(mdl_fit, pars = c("age_breeding"), labels = paste0("A[", sefra_dat$species, "]"))
trace_plot(mdl_fit, pars = c("adult_survival"), labels = paste0("S[", sefra_dat$species, "]"))
trace_plot(mdl_fit, pars = c("n_breeding_pairs"), labels = paste0("N[", sefra_dat$species, "]"))
trace_plot(mdl_fit, pars = c("p_obs_logit"))
To visualise prior updates to the biological input distributions, we use, for example:
plot_prior_update(mdl_fit, par = "n_breeding_pairs", labels = sefra_dat$species)
Only one biological parameter can be included per box-plot.
Diagnostic plots also exist for Rhat and Neff. For example:
par_labels <- expression(alpha[1], alpha[2], alpha[1][","][1], alpha[2][","][1], alpha[1][","][2], alpha[2][","][2], alpha[1][","][3], alpha[2][","][3], beta[1], beta[2], beta[1][","][1], beta[2][","][1], beta[1][","][2], beta[2][","][2], beta[1][","][3], beta[2][","][3])
plot_rhat(mdl_fit, pars = c("alpha_fishery_group", "alpha_species_group", "beta_fishery_group", "beta_species_group"), labels = par_labels)
plot_neff(mdl_fit, pars = c("alpha_fishery_group", "alpha_species_group", "beta_fishery_group", "beta_species_group"), labels = par_labels)
Perhaps the most useful diagnostic is provided from examination of fits to the capture data, and a specific function exists for this purpose:
captures(sefra_dat, mdl_fit)## $captures
## # A tibble: 20 × 7
## fishery_group code captures hat med low_ci upp_ci
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 fishery_1 DIW 10 6.7 6 1 14.0
## 2 fishery_1 DQS 13 8.62 8 2 15.0
## 3 fishery_1 TWD 7 4.26 4 1 8.03
## 4 fishery_1 DGA 0 4.00 4 0 10
## 5 fishery_1 DST 0 3.04 3 0 8.03
## 6 fishery_1 DWC 0 4.31 4 0.975 9
## 7 fishery_1 DIZ 7 5.63 5 0 13
## 8 fishery_1 THZ 4 3.1 3 0 7.03
## 9 fishery_1 ALZ 0 9.38 9 2 19.1
## 10 fishery_1 BLZ 14 11.4 11 3 22.0
## 11 fishery_2 DIW 16 11.2 11 5 19.0
## 12 fishery_2 DQS 10 6.92 7 1 15
## 13 fishery_2 TWD 14 8.12 8 3 15
## 14 fishery_2 DGA 0 5.35 5 0 11.0
## 15 fishery_2 DST 0 3.88 3 0 11
## 16 fishery_2 DWC 0 5.21 5 0.975 12.0
## 17 fishery_2 DIZ 12 7.10 6 1 17.0
## 18 fishery_2 THZ 1 3.78 3 0 10
## 19 fishery_2 ALZ 0 10.1 9 3 21.0
## 20 fishery_2 BLZ 11 13.2 13 2.98 28
##
## $captures_inclusive
## # A tibble: 20 × 7
## fishery_group code captures hat med low_ci upp_ci
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 fishery_1 DIW 10 6.7 6 1 13
## 2 fishery_1 DQS 13 8.48 8 2 17.0
## 3 fishery_1 TWD 7 4.45 4 1 9.03
## 4 fishery_1 DGA 23 19.0 19 10 29.1
## 5 fishery_1 DST 7 7.26 7 2.98 13
## 6 fishery_1 DWC 23 24.0 23 14.0 36
## 7 fishery_1 DIZ 30 28.8 29 18.0 43.1
## 8 fishery_1 THZ 11 10.5 10 4 18.0
## 9 fishery_1 ALZ 41 48.1 48 31 64
## 10 fishery_1 BLZ 55 60.7 60 45.0 81
## 11 fishery_2 DIW 16 11.0 11 3.98 19.0
## 12 fishery_2 DQS 10 6.84 6 2 14.0
## 13 fishery_2 TWD 14 8.43 8 3 16
## 14 fishery_2 DGA 26 22.9 23 14 34
## 15 fishery_2 DST 14 12.3 12 5 20.1
## 16 fishery_2 DWC 26 28.9 28 16 45
## 17 fishery_2 DIZ 38 35.2 35 22 51
## 18 fishery_2 THZ 15 16.4 16 8 27
## 19 fishery_2 ALZ 53 61.3 61 44.0 80
## 20 fishery_2 BLZ 64 74.6 74 58 95.1
Further diagnostic plots will be developed as the project progresses.