── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.3 ✔ purrr 0.3.4
✔ tibble 3.1.2 ✔ dplyr 1.0.6
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 1.4.0 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Linking to GEOS 3.9.1, GDAL 3.3.0, PROJ 8.0.0
Loading required package: viridisLite
Loading required package: Rcpp
Loading 'brms' package (version 2.14.4). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').
Attaching package: ‘brms’
The following object is masked from ‘package:stats’:
ar
This is loo version 2.4.1
- Online documentation and vignettes at mc-stan.org/loo
- As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session.
Attaching package: ‘bridgesampling’
The following object is masked from ‘package:brms’:
bf
This is bayesplot version 1.8.0
- Online documentation and vignettes at mc-stan.org/bayesplot
- bayesplot theme set to bayesplot::theme_default()
* Does _not_ affect other ggplot2 plots
* See ?bayesplot_theme_set for details on theme setting
Map saved to /mnt/c/Users/gerha/OneDrive - UT Cloud/shareAcrossMachines/_lehre/ws2425/esslli2024_statistical_typology/slides/_img/sound_inventory_map.svg
Size: 9.765879 by 5.01747 inches
Map saved to /mnt/c/Users/gerha/OneDrive - UT Cloud/shareAcrossMachines/_lehre/ws2425/esslli2024_statistical_typology/slides/_img/logpop_map.svg
Size: 9.765879 by 5.01747 inches
ggplot(sound_inventory_population, aes(x = logpop, y = nSegments)) +geom_point(aes(color = Macroarea), alpha =0.6) +# Scatter plot with colors by Macroareageom_smooth(method ="lm", se =TRUE, color ="black") +# Trend line for the entire dataframescale_y_log10() +geom_smooth(aes(group = Macroarea, color = Macroarea), method ="lm", se =FALSE) +# Trend lines for each Macroarealabs(title ="Scatter plot of n_segments vs logpop",x ="Population (log)",y ="Number of Segments",color ="Macroarea") +# Label for the legendtheme_minimal() -> sound_pop_scatterggsave(sound_pop_scatter, file ="_img/sound_pop_scatter.svg")
Saving 6.67 x 6.67 in image
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
sound_pop_median_scatter <-ggplot(medians, aes(x = median_logpop, y = median_n_segments)) +scale_y_log10() +geom_point(aes(color = Macroarea), size =4) +# Scatter plot with colors by Macroareageom_text(aes(label = Macroarea), vjust="inward",hjust="inward", check_overlap =TRUE) +# Adding labels for each macroareageom_smooth(method ="lm", se =TRUE, color ="black", fill ="lightgray") +# Trend line with uncertainty intervallabs(title ="Scatter plot of medians by Macroarea",x ="Median Log Population",y ="Median Number of Segments",color ="Macroarea") +# Label for the legendtheme_minimal()ggsave(sound_pop_median_scatter, file ="_img/sound_pop_median_scatter.svg")
Saving 6.67 x 6.67 in image
`geom_smooth()` using formula 'y ~ x'
The correlation seems to be mostly between macroareas.
Warning message:
“There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup”
Family: poisson
Links: mu = log
Formula: nSegments ~ logpop + (1 | Macroarea) + (1 | Family)
Data: sound_inventory_population (Number of observations: 1645)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~Family (Number of levels: 159)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.27 0.02 0.24 0.32 1.00 957 1384
~Macroarea (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.31 0.16 0.15 0.71 1.00 1164 1785
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.42 0.15 3.14 3.68 1.00 1258 1576
logpop -0.01 0.00 -0.02 -0.00 1.00 7312 3511
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
posterior_samples <-as.mcmc(fit_c_4)mcmc_areas( posterior_samples, pars ="b_logpop", prob =0.95# 95% HPD interval) +ggtitle("MCMC Density Plot") +xlab("log(population) Coefficient") -> sound_inventories_slope_hpd_4ggsave(sound_inventories_slope_hpd_4, file ="_img/sound_inventories_slope_hpd_4.svg")
Saving 6.67 x 6.67 in image
# Compute LOO for each modelloo_1 <-loo(fit_c_1)loo_2 <-loo(fit_c_2)loo_3 <-loo(fit_c_3)loo_4 <-loo(fit_c_4)
Warning message:
“Found 57 observations with a pareto_k > 0.7 in model 'fit_c_3'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations. ”
Warning message:
“Found 41 observations with a pareto_k > 0.7 in model 'fit_c_4'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations. ”
# Compare the modelsloo_comparison <-loo_compare(loo_1, loo_2, loo_3, loo_4)# Print the comparisonprint(loo_comparison)
Model 4 provides the best fit for the data. It predicts a negative coefficient for log(population). So contrary to the initial impression, large languages tend to have slightly smaller phoneme inventories if we control for family and macroarea.