Skip to contents

Load packages required for data preparation:

Prelims

Source data from sefraInputs:

sefra_data("inputsBio")
## Loaded data:
## 
## 
## |name      |description |created             |version                | id|
## |:---------|:-----------|:-------------------|:----------------------|--:|
## |inputsBio |reference   |2025-03-27 12:25:55 |20250327T112555Z-ed41c |  2|
sefra_data("cryptic_capture_longline")
## Loaded data:
## 
## 
## |name                     |description |created             |version                | id|
## |:------------------------|:-----------|:-------------------|:----------------------|--:|
## |cryptic_capture_longline |reference   |2025-03-27 12:25:55 |20250327T112555Z-240cc |  1|

Extract biological data for reference case:

# Import demographic data
N_BP       <- inputsBio[["N_BP"]]
S_opt      <- inputsBio[["S_opt"]]
A_curr     <- inputsBio[["A_curr"]]
P_B        <- inputsBio[["P_B"]]
P_nest     <- inputsBio[["p_nest"]]
P_southern <- inputsBio[["p_southern"]]

The sefraData object

Data are stored in a S4 object of class sefraData. The sefraData object is a list with pre-defined components. When assigning data to this list, checks are made during the assignment to ensure the data are in the correct format for model input. The sefraData object is initialised using a call to sefraData(<species>, <identifier>). A vector of species must be supplied (see ?species for a list of those available). By defining the species, all subsequent assignments can be checked for consistency with this species list. An identifier argument is also allowed, so that different data configurations can be labelled. In this vignette, we demonstrate how an sefraData object can be populated and input to a model run.

First, initialise the data object using three example species:

sefra_dat <- sefraData(c('DIW', 'DQS', 'TWD'), name = "reference")
## species codes input: including all possible capture codes
## constructed 'sefraData' object

When choosing the species we are assuming that all captures are from one of these species, even if the capture code recorded in the capture data refers to a lower taxonomic resolution. In the example below, captures are also recorded using the generic BLZ code. The model will assume that genus-, family- or generic-level captures are of DIW, DQS or TWD; i.e., when fitting to the data the captures will be predicted from the fishery overlap with these species only. If generic captures could be of other species not recorded at the species level, then these species should be included in the sefraData object. In which case the model will select a zero probability of observation for that species.

Following initialisation of the sefraData object, the species names are available using the accessor function:

species_names(sefra_dat)
## Species
##    code                        common_name                    scientific_name
## 1   DIW                 Gibson's albatross      Diomedea antipodensis gibsoni
## 2   DQS               Antipodean albatross Diomedea antipodensis antipodensis
## 13  TWD New Zealand white-capped albatross          Thalassarche cauta steadi
##           genus      family code_resolution id_species
## 1      Diomedea Diomedeidae         species          1
## 2      Diomedea Diomedeidae         species          2
## 13 Thalassarche Diomedeidae         species         13

which can be useful for labeling plots of model diagnostics. Note that the id_species corresponds to the order recorded in ?species. This numbering is retained throughout, allowing the data inputs and model outputs to be matched consistently to the species.

The relevant capture codes are automatically generated from the list of species, and will include the species-level capture codes, but also lower level taxonomic codes. In the current example:

capture_codes(sefra_dat)
## 10 capture codes:
## (empty captures data frame)
##    code id_code resolution id_resolution
## 1   DIW       1    species             1
## 2   DQS       2    species             1
## 3   TWD      13    species             1
## 4   DGA      26    complex             2
## 5   DST      29    complex             2
## 6   DWC      32    complex             2
## 7   DIZ      34      genus             3
## 8   THZ      35      genus             3
## 9   ALZ      38     family             4
## 10  BLZ      40      class             5

Assign biological data

From the inputsBio data accessed using sefraInputs::sefra_data(), we assign these to the sefraData object. Assignment functions are provided for each data type, and automatically select the required species:

n_breeding_pairs(sefra_dat) <- N_BP       
adult_survival(sefra_dat)   <- S_opt      
p_breeding(sefra_dat)       <- P_B        
age_breeding(sefra_dat)     <- A_curr     
p_nest(sefra_dat)           <- P_nest
p_southern(sefra_dat)       <- P_southern

Biological inputs for the number of breeding pairs, the adult survival, the probability of breeding, and the age at first breeding should be provided as two-parameter probability density functions. These can be one of: uniform, beta, normal, log-normal or logit-normal (see ?distributions). The input data frame must contain the column headers distribution, par1 and par2.

We can view the input data using, for example:

N_BP %>% select(id_species, code, distribution, par1, par2)
##    id_species code distribution       par1      par2
## 1           1  DIW   log-normal    4425.00     0.050
## 2           2  DQS   log-normal    3383.00     0.050
## 3           3  DIX   log-normal   10130.00     0.050
## 4           4  DBN      weibull       9.25  1710.000
## 5           5  DAM   log-normal      60.00     0.100
## 6           6  DIP   log-normal    5814.00     0.070
## 7           7  DIQ   log-normal    4261.00     0.110
## 8           8  DCR   log-normal   26800.00     0.100
## 9           9  TQH   log-normal   33988.00     0.100
## 10         10  DIM   log-normal  670960.00     0.050
## 11         11  TQW   log-normal   14129.00     0.050
## 12         12  DCU   log-normal   15335.00     0.100
## 13         13  TWD   log-normal   85820.00     0.120
## 14         14  DKS   log-normal   35242.00     0.050
## 15         15  DER   log-normal    5294.00     0.010
## 16         16  DIC   log-normal   63055.00     0.050
## 17         17  DSB   log-normal   13493.00     0.050
## 18         18  DNB   log-normal   19354.00     0.050
## 19         19  PHU      weibull      23.20 13660.000
## 20         20  PHE   log-normal   20927.00     0.100
## 21         21  PCI   log-normal  105617.00     0.150
## 22         22  PRK   log-normal    5456.00     0.057
## 23         23  PCW   log-normal    6223.00     0.061
## 24         24  PRO   log-normal 1317300.00     0.100
## 25         25  PCN   log-normal   42000.00     0.096

The assigned model values are:

n_breeding_pairs(sefra_dat)
##     distribution  par1 par2
## DIW   log-normal  4425 0.05
## DQS   log-normal  3383 0.05
## TWD   log-normal 85820 0.12

Similarly:

P_B %>% select(id_species, code, distribution, par1, par2)
##    id_species code distribution  par1    par2
## 1           1  DIW         beta 0.595  170.00
## 2           2  DQS         beta 0.450   91.30
## 3           3  DIX logit-normal 0.494    0.05
## 4           4  DBN         beta 0.349   51.30
## 5           5  DAM logit-normal 0.600    0.05
## 6           6  DIP         beta 0.531   22.20
## 7           7  DIQ         beta 0.531   22.20
## 8           8  DCR         beta 0.596 4100.00
## 9           9  TQH logit-normal 0.596    0.05
## 10         10  DIM         beta 0.844  174.00
## 11         11  TQW logit-normal 0.900    0.05
## 12         12  DCU logit-normal 0.747    0.05
## 13         13  TWD         beta 0.680   63.90
## 14         14  DKS         beta 0.821   29.70
## 15         15  DER logit-normal 0.773    0.05
## 16         16  DIC         beta 0.406   17.50
## 17         17  DSB         beta 0.804   34.90
## 18         18  DNB logit-normal 0.800    0.05
## 19         19  PHU logit-normal 0.730    0.05
## 20         20  PHE         beta 0.730   15.80
## 21         21  PCI logit-normal 0.900    0.05
## 22         22  PRK         beta 0.610  143.00
## 23         23  PCW         beta 0.480   45.40
## 24         24  PRO logit-normal 0.750    0.05
## 25         25  PCN logit-normal 0.797    0.05
p_breeding(sefra_dat)
##     distribution  par1  par2
## DIW         beta 0.595 170.0
## DQS         beta 0.450  91.3
## TWD         beta 0.680  63.9

If you wish to use alternate biological data values, please provide these directly to the project team for inclusion in the sefraInputs package. This will ensure they are available to all participants. Centralisation of the data inputs further allows for consistent cross referencing of different data supplied to different model runs.

Fisheries and structural data

Fisheries input data are stored in the sefraData object in three data frames:

  • observed captures:
  • observed overlap
  • commercial (total) overlap

All the supplied data frames should contain the following headers:

  • month as a character vector and/or id_month as an integer value between 1 and 12 (see ?months);
  • year a an integer value;
  • fishery_group as a character vector and/or id_fishery_group as integer values.

In addition, the captures data frame must include the following field headers:

  • code as a character vector and/or id_code as integer values (see ?codes for permitted capture codes);
  • n_captures,
  • n_captures_alive and,
  • n_captures_dead as integer values.

The overlap data frames should include:

  • species as a character vector and/or id_species as integer values (see ?species for permitted capture codes);
  • species_group as a character vector and/or id_species_group as integer values;
  • cell and/or id_cell as an integer value between 1 and 1224, this being the number of 5x5 degree cells in the southern hemisphere. For assignment of data to cells, please see the documentation for the sefraInputs package.

All months are referenced using an integer value between 1 and 12. This is true regardless of whether a character vector is supplied. Similarly, all species are referenced using an integer value between 1 and 25, and this id_species value is retained throughout the assessment. The capture code is referenced using an integer value for id_code between 1 and 40, which is similarly retained throughout the assessment and will be consistent across different model runs or data constructs. If you wish to included additional species or capture codes please contact the project team.

There are no fixed reference values for the fishery group or species group, since these are dependent on the analysis and must be provided by the investigator (see the vignette for sefraInputs).

Assignment functions are provided for the species group and fishery group. These represent structural assumptions and must be supplied to the sefraData object before the overlap and capture data frames. This two-step process facilitates checking of the data.

To illustrate, we first create some artificial data, as three separate data frames: captures_o, overlap_o and overlap_t.

Header from observed captures data frame
n_captures cell year code month flag n_captures_alive n_captures_dead id_month id_fishery_group
0 700 2008 DQS Sep F2 0 0 9 2
0 327 2005 DIZ Aug F2 0 0 8 2
0 704 2017 DQS Sep F1 0 0 9 1
0 909 2006 TWD Sep F2 0 0 9 2
0 186 2010 DIW Jun F1 0 0 6 1
0 263 2015 TWD Sep F1 0 0 9 1
Header from observed overlap data frame
id_species id_month id_fishery_group id_species_group year cell overlap
2 9 2 2 2008 700 0.0006489
2 8 2 2 2005 327 0.0003481
2 9 1 2 2017 704 0.0004360
13 9 2 3 2006 909 0.0009274
1 6 1 1 2010 186 0.0007493
13 9 1 3 2015 263 0.0005270
Header from commercial (total) overlap data frame
id_species id_month id_fishery_group id_species_group year cell overlap
2 9 2 2 2008 700 0.0009973
2 8 2 2 2005 327 0.0006618
2 9 1 2 2017 704 0.0005981
13 9 2 3 2006 909 0.0008382
1 6 1 1 2010 186 0.0000331
13 9 1 3 2015 263 0.0005699

Functions are provided to assign structural assumptions based on the input data frames. For example, we can assign fishery groups using:

fishery_groups(sefra_dat) <- captures_o
fishery_groups(sefra_dat)
## 2 fishery groups
##   fishery_group id_fishery_group
## 1     fishery_1                1
## 2     fishery_2                2

This does not assign any actual data. It only extracts structural information from the data captures_o data frame. Structural assumptions can also be provided as character vectors. In this case, care must be taken to ensure the values match the values in the corresponding input data frames. To specify species groups for example, we could use:

species_groups(sefra_dat) <- c('group_1', 'group_2', 'group_3')

The same results could be achieved by assigning a data frame directly, which is the recommended approach. In this case:

species_groups(sefra_dat) <- overlap_o
species_groups(sefra_dat)
## 3 species groups
##   species_group id_species_group
## 1       group_1                1
## 2       group_2                2
## 3       group_3                3

When examining the species group assignments to each species, we can use:

species_groups(sefra_dat, print_species = TRUE)
## 3 species groups
## (with 3 species)
##   species species_group id_species id_species_group
## 1     DIW       group_1          1                1
## 2     DQS       group_2          2                2
## 3     TWD       group_3         13                3

The data frames themselves can then be assigned. In the first instance, overlap and captures data are provided:

sefra_dat <- sefra_dat %>% data_prep(overlap_o, captures_o)
## Prepared observer captures and overlap data

In this case, two separate data frames are supplied to the data_prep() function. Multiple, data frame can be provided, as long as the referencing is consistent. It is the presence of captures (i.e., n_captures) in at least one of the data frames that identifies the data as observed. Otherwise it is treated as commercial overlap data.

Following assignment of capture data, the captures per code can be viewed using:

capture_codes(sefra_dat)
## 10 capture codes:
## Joining with `by = join_by(id_code)`
##    code id_code resolution id_resolution captures
## 1   DIW       1    species             1       26
## 2   DQS       2    species             1       23
## 3   TWD      13    species             1       21
## 4   DGA      26    complex             2        0
## 5   DST      29    complex             2        0
## 6   DWC      32    complex             2        0
## 7   DIZ      34      genus             3       19
## 8   THZ      35      genus             3        5
## 9   ALZ      38     family             4        0
## 10  BLZ      40      class             5       25

or by taxonomic resolution using:

## 10 capture code resolutions:
## Joining with `by = join_by(id_fishery_group, id_code)`
##    id_fishery_group id_resolution resolution captures
## 1                 1             1    complex        0
## 2                 1             1    species       10
## 3                 1             2      genus        7
## 4                 1             2    species       13
## 5                 1             3      genus        4
## 6                 1             3    species        7
## 7                 1             4    complex        0
## 8                 1             4     family        0
## 9                 1             5      class       14
## 10                1             5    complex        0
## 11                2             1    complex        0
## 12                2             1    species       16
## 13                2             2      genus       12
## 14                2             2    species       10
## 15                2             3      genus        1
## 16                2             3    species       14
## 17                2             4    complex        0
## 18                2             4     family        0
## 19                2             5      class       11
## 20                2             5    complex        0

In the latter case, captures are disaggregated by fishery group.

To assign the commercial overlap data use:

sefra_dat <- sefra_dat %>% data_prep(overlap_t)
## Prepared commercial (total) overlap data

To check the data have loaded correctly we can view the object:

sefra_dat
## 'sefraData' class object:
## Species
##    code                        common_name                    scientific_name
## 1   DIW                 Gibson's albatross      Diomedea antipodensis gibsoni
## 2   DQS               Antipodean albatross Diomedea antipodensis antipodensis
## 13  TWD New Zealand white-capped albatross          Thalassarche cauta steadi
##           genus      family code_resolution id_species
## 1      Diomedea Diomedeidae         species          1
## 2      Diomedea Diomedeidae         species          2
## 13 Thalassarche Diomedeidae         species         13
## 3 species groups
## (with 3 species)
##   species species_group id_species id_species_group
## 1     DIW       group_1          1                1
## 2     DQS       group_2          2                2
## 3     TWD       group_3         13                3
## 2 fishery groups
##   fishery_group id_fishery_group
## 1     fishery_1                1
## 2     fishery_2                2
## 10 capture codes:
## Joining with `by = join_by(id_code)`
##    code id_code resolution id_resolution captures
## 1   DIW       1    species             1       26
## 2   DQS       2    species             1       23
## 3   TWD      13    species             1       21
## 4   DGA      26    complex             2        0
## 5   DST      29    complex             2        0
## 6   DWC      32    complex             2        0
## 7   DIZ      34      genus             3       19
## 8   THZ      35      genus             3        5
## 9   ALZ      38     family             4        0
## 10  BLZ      40      class             5       25
## Captures data frame:
## # A tibble: 82 × 6
##    captures_k captures_live_k captures_dead_k code_k month_k fishery_group_k
##         <int>           <int>           <int>  <int>   <int>           <int>
##  1          3               0               3      1       1               1
##  2          1               0               1      1       4               1
##  3          1               0               1      1       5               1
##  4          1               0               1      1       6               1
##  5          1               1               0      1       8               1
##  6          2               1               1      1      10               1
##  7          1               1               0      1      11               1
##  8          2               2               0      1       1               2
##  9          3               0               3      1       2               2
## 10          1               1               0      1       4               2
## # ℹ 72 more rows
## Observed overlap data frame:
## # A tibble: 72 × 5
##    overlap_i species_i species_group_i month_i fishery_group_i
##        <dbl>     <int>           <int>   <int>           <int>
##  1   0.00413         1               1       1               1
##  2   0.00596         1               1       2               1
##  3   0.00827         1               1       3               1
##  4   0.00963         1               1       4               1
##  5   0.0108          1               1       5               1
##  6   0.00691         1               1       6               1
##  7   0.00653         1               1       7               1
##  8   0.00827         1               1       8               1
##  9   0.0104          1               1       9               1
## 10   0.00748         1               1      10               1
## # ℹ 62 more rows
## Commercial overlap data frame:
## # A tibble: 995 × 5
##     overlap_j species_j species_group_j month_j fishery_group_j
##         <dbl>     <int>           <int>   <int>           <int>
##  1 0.000975           1               1       1               1
##  2 0.000378           1               1       1               1
##  3 0.000891           1               1       1               1
##  4 0.000905           1               1       1               1
##  5 0.000111           1               1       1               1
##  6 0.000792           1               1       1               1
##  7 0.000567           1               1       1               1
##  8 0.000712           1               1       1               1
##  9 0.00000396         1               1       1               1
## 10 0.000947           1               1       1               1
## # ℹ 985 more rows

Captures and overlap data can also be extracted using, respectively:

captures(sefra_dat)
## # A tibble: 82 × 6
##    captures_k captures_live_k captures_dead_k code_k month_k fishery_group_k
##         <int>           <int>           <int>  <int>   <int>           <int>
##  1          3               0               3      1       1               1
##  2          1               0               1      1       4               1
##  3          1               0               1      1       5               1
##  4          1               0               1      1       6               1
##  5          1               1               0      1       8               1
##  6          2               1               1      1      10               1
##  7          1               1               0      1      11               1
##  8          2               2               0      1       1               2
##  9          3               0               3      1       2               2
## 10          1               1               0      1       4               2
## # ℹ 72 more rows
overlap(sefra_dat)
## $observer
## # A tibble: 72 × 5
##    overlap_i species_i species_group_i month_i fishery_group_i
##        <dbl>     <int>           <int>   <int>           <int>
##  1   0.00413         1               1       1               1
##  2   0.00596         1               1       2               1
##  3   0.00827         1               1       3               1
##  4   0.00963         1               1       4               1
##  5   0.0108          1               1       5               1
##  6   0.00691         1               1       6               1
##  7   0.00653         1               1       7               1
##  8   0.00827         1               1       8               1
##  9   0.0104          1               1       9               1
## 10   0.00748         1               1      10               1
## # ℹ 62 more rows
## 
## $fishery
## # A tibble: 995 × 5
##     overlap_j species_j species_group_j month_j fishery_group_j
##         <dbl>     <int>           <int>   <int>           <int>
##  1 0.000975           1               1       1               1
##  2 0.000378           1               1       1               1
##  3 0.000891           1               1       1               1
##  4 0.000905           1               1       1               1
##  5 0.000111           1               1       1               1
##  6 0.000792           1               1       1               1
##  7 0.000567           1               1       1               1
##  8 0.000712           1               1       1               1
##  9 0.00000396         1               1       1               1
## 10 0.000947           1               1       1               1
## # ℹ 985 more rows

Cryptic capture

Cryptic capture distribution parameters are defined by species group and fishery group, and can be assigned as follows:

cryptic_capture(sefra_dat)  <- cryptic_capture_longline
## 2 fishery groups
## 3 species groups
cryptic_capture(sefra_dat)
##    distribution fishery_group species_group  par value
## 1    log-normal     fishery_1       group_1 par1 1.420
## 2    log-normal     fishery_2       group_1 par1 1.420
## 3    log-normal     fishery_1       group_2 par1 1.420
## 4    log-normal     fishery_2       group_2 par1 1.420
## 5    log-normal     fishery_1       group_3 par1 1.420
## 6    log-normal     fishery_2       group_3 par1 1.420
## 7    log-normal     fishery_1       group_1 par2 0.186
## 8    log-normal     fishery_2       group_1 par2 0.186
## 9    log-normal     fishery_1       group_2 par2 0.186
## 10   log-normal     fishery_2       group_2 par2 0.186
## 11   log-normal     fishery_1       group_3 par2 0.186
## 12   log-normal     fishery_2       group_3 par2 0.186

At present, only a log-normal distribution is allowed. Values stored in the the cryptic_capture_longline object represent the author’s best understanding of cryptic capture in longline fisheries, but can be adjusted as required by the user. We have stratified the cryptic captures assumption by species group, because we are currently not aware of any information that would allow a higher resolution.

Example model run

First, source and compile the model code:

mdl <- sefra(sefra_dat)
## SEFRA-seabird model v2.4.0
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install the appropriate version of Rtools for 4.4.3 from
## https://cran.r-project.org/bin/windows/Rtools/.
## Trying to compile a simple C file
## Running "C:/R/bin/x64/Rcmd.exe" SHLIB foo.c
## using C compiler: 'gcc.exe (GCC) 13.2.0'
## make[1]: Entering directory '/c/Users/ctted/AppData/Local/Temp/RtmpGsJ18v'
## gcc  -I"C:/R/include" -DNDEBUG   -I"C:/Rlibrary/Rcpp/include/"  -I"C:/Rlibrary/RcppEigen/include/"  -I"C:/Rlibrary/RcppEigen/include/unsupported"  -I"C:/Rlibrary/BH/include" -I"C:/Rlibrary/StanHeaders/include/src/"  -I"C:/Rlibrary/StanHeaders/include/"  -I"C:/Rlibrary/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Rlibrary/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Rlibrary/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools44/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c foo.c -o foo.o
## cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
## In file included from C:/Rlibrary/RcppEigen/include/Eigen/Core:19,
##                  from C:/Rlibrary/RcppEigen/include/Eigen/Dense:1,
##                  from C:/Rlibrary/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
##                  from <command-line>:
## C:/Rlibrary/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
##   679 | #include <cmath>
##       |          ^~~~~~~
## compilation terminated.
## make[1]: *** [C:/R/etc/x64/Makeconf:289: foo.o] Error 1
## make[1]: Leaving directory '/c/Users/ctted/AppData/Local/Temp/RtmpGsJ18v'
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install the appropriate version of Rtools for 4.4.3 from
## https://cran.r-project.org/bin/windows/Rtools/.

The sefra() function can optionally be supplied with the sefraData object as an argument. If commercial overlap is missing from the data, then predicted captures, deaths and risk are not calculated by the model. Only catchabilities and fits to the observed captures will be generated. This will lower the memory requirements for the model and may be useful for initial model-based explorations of the data.

Note that a custom model can also be provided to sefra(). In this instance we recommend first calling sefra(write_to = <file>) to write the stan code as a text file. This base code can then be modified by the user and re-compiled using sefra(read_from = <file>).

The function initial_values() can be used to create a list of initial values. It can also be called with the model as an argument, in which case a maximum posterior density estimate is returned, and initial fits are plotted. To check initial values:

mdl_ini <- initial_values(sefra_dat, mdl, iter = 200)

This can be useful for quickly exploring the structural assumptions of the model, and whether reasonable fits to the data are obtained. Typically, structural mis-specification of the species and fishery groups, or a deficiency in the data, will lead to stronger prior updates on the biological parameters.

When fitting the model, captures are summed per fishery group and capture code. Following summation, capture codes represent all captures at that taxonomic resolution or higher. For example, using the current data, the captures per capture code and fishery group are represented within the model as a two-dimensional matrix, with the fishery groups along the first dimension:

## `summarise()` has grouped output by 'id_fishery_group'. You can override using
## the `.groups` argument.
## Joining with `by = join_by(id_fishery_group, code)`
##           DIW DQS TWD DGA DST DWC DIZ THZ ALZ BLZ
## fishery_1  10  13   7   0   0   0   7   4   0  14
## fishery_2  16  10  14   0   0   0  12   1   0  11

The cumulative sum (or “inclusive sum”) of the capture codes used for fitting the model are:

##           DIW DQS TWD DGA DST DWC DIZ THZ ALZ BLZ
## fishery_1  10  13   7  23   7  23  30  11  41  55
## fishery_2  16  10  14  26  14  26  38  15  53  64

For the BLZ capture code, there are 25 captures recorded as BLZ in the empirical data, but a total of 119 BLZ captures when summed across captures recorded as BLZ or a higher taxonomic resolution. We refer to 25 as the empirical captures and 119 as the inclusive captures. Similarly for DIZ we have 19 empirical captures and 68 inclusive captures. This definition of inclusive captures is relevant to the model fit and downstream diagnostics.

Model run

Run the model using a standard call from the rstan package:

mdl_fit <- rstan::sampling(mdl, data = sefra_dat, verbose = TRUE, init = function() mdl_ini, chains = 2, iter = 200, control = list(adapt_delta = 0.95, stepsize = 0.1, max_treedepth = 100))
## Warning: The largest R-hat is NA, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess

The different scales on which biological parameters are estimated makes it difficult for the MCMC chain to explore the full parameter space, and we therefore recommend the above settings for the control argument, which will assist convergence. We further recommend that uniform priors should only be used for p_breeding and adult_survival (if at all) and not for n_breeding_pairs or age_breeding.

Functions exist to extract and summarise relevant model outputs, to allow the user to create their own diagnostic plots and tables:

out <- sefraOutputs(sefra_dat, mdl_fit)
## constructed 'sefraOutputs' object
out_summary <- summary(out)

These are not explored further here.

Model diagnostics

Trace plots can be used to visualise convergence. For example, we can plot trace diagnostics for the biological parameters using:

trace_plot(mdl_fit, pars = c("p_breeding"), labels = paste0("P[", sefra_dat$species, "]"))

trace_plot(mdl_fit, pars = c("age_breeding"), labels = paste0("A[", sefra_dat$species, "]"))

trace_plot(mdl_fit, pars = c("adult_survival"), labels = paste0("S[", sefra_dat$species, "]"))

trace_plot(mdl_fit, pars = c("n_breeding_pairs"), labels = paste0("N[", sefra_dat$species, "]"))

trace_plot(mdl_fit, pars = c("p_obs_logit"))

To visualise prior updates to the biological input distributions, we use, for example:

plot_prior_update(mdl_fit, par = "n_breeding_pairs", labels = sefra_dat$species)

Only one biological parameter can be included per box-plot.

Diagnostic plots also exist for Rhat and Neff. For example:

par_labels <- expression(alpha[1], alpha[2], alpha[1][","][1], alpha[2][","][1], alpha[1][","][2], alpha[2][","][2], alpha[1][","][3], alpha[2][","][3], beta[1], beta[2], beta[1][","][1], beta[2][","][1], beta[1][","][2], beta[2][","][2], beta[1][","][3], beta[2][","][3])

plot_rhat(mdl_fit, pars = c("alpha_fishery_group", "alpha_species_group", "beta_fishery_group", "beta_species_group"), labels = par_labels)

plot_neff(mdl_fit, pars = c("alpha_fishery_group", "alpha_species_group", "beta_fishery_group", "beta_species_group"), labels = par_labels)

Perhaps the most useful diagnostic is provided from examination of fits to the capture data, and a specific function exists for this purpose:

captures(sefra_dat, mdl_fit)
## $captures
## # A tibble: 20 × 7
##    fishery_group code  captures   hat   med low_ci upp_ci
##    <chr>         <chr>    <int> <dbl> <dbl>  <dbl>  <dbl>
##  1 fishery_1     DIW         10  6.7      6  1      14.0 
##  2 fishery_1     DQS         13  8.62     8  2      15.0 
##  3 fishery_1     TWD          7  4.26     4  1       8.03
##  4 fishery_1     DGA          0  4.00     4  0      10   
##  5 fishery_1     DST          0  3.04     3  0       8.03
##  6 fishery_1     DWC          0  4.31     4  0.975   9   
##  7 fishery_1     DIZ          7  5.63     5  0      13   
##  8 fishery_1     THZ          4  3.1      3  0       7.03
##  9 fishery_1     ALZ          0  9.38     9  2      19.1 
## 10 fishery_1     BLZ         14 11.4     11  3      22.0 
## 11 fishery_2     DIW         16 11.2     11  5      19.0 
## 12 fishery_2     DQS         10  6.92     7  1      15   
## 13 fishery_2     TWD         14  8.12     8  3      15   
## 14 fishery_2     DGA          0  5.35     5  0      11.0 
## 15 fishery_2     DST          0  3.88     3  0      11   
## 16 fishery_2     DWC          0  5.21     5  0.975  12.0 
## 17 fishery_2     DIZ         12  7.10     6  1      17.0 
## 18 fishery_2     THZ          1  3.78     3  0      10   
## 19 fishery_2     ALZ          0 10.1      9  3      21.0 
## 20 fishery_2     BLZ         11 13.2     13  2.98   28   
## 
## $captures_inclusive
## # A tibble: 20 × 7
##    fishery_group code  captures   hat   med low_ci upp_ci
##    <chr>         <chr>    <int> <dbl> <dbl>  <dbl>  <dbl>
##  1 fishery_1     DIW         10  6.7      6   1     13   
##  2 fishery_1     DQS         13  8.48     8   2     17.0 
##  3 fishery_1     TWD          7  4.45     4   1      9.03
##  4 fishery_1     DGA         23 19.0     19  10     29.1 
##  5 fishery_1     DST          7  7.26     7   2.98  13   
##  6 fishery_1     DWC         23 24.0     23  14.0   36   
##  7 fishery_1     DIZ         30 28.8     29  18.0   43.1 
##  8 fishery_1     THZ         11 10.5     10   4     18.0 
##  9 fishery_1     ALZ         41 48.1     48  31     64   
## 10 fishery_1     BLZ         55 60.7     60  45.0   81   
## 11 fishery_2     DIW         16 11.0     11   3.98  19.0 
## 12 fishery_2     DQS         10  6.84     6   2     14.0 
## 13 fishery_2     TWD         14  8.43     8   3     16   
## 14 fishery_2     DGA         26 22.9     23  14     34   
## 15 fishery_2     DST         14 12.3     12   5     20.1 
## 16 fishery_2     DWC         26 28.9     28  16     45   
## 17 fishery_2     DIZ         38 35.2     35  22     51   
## 18 fishery_2     THZ         15 16.4     16   8     27   
## 19 fishery_2     ALZ         53 61.3     61  44.0   80   
## 20 fishery_2     BLZ         64 74.6     74  58     95.1

Further diagnostic plots will be developed as the project progresses.