cd(@__DIR__)
using Pkg
Pkg.activate(".")
Pkg.instantiate() Activating project at `~/projects/research/natalia_causality/code`
Now that the distribution of the latent variables are estimated, I’ll try my hand with causal inference, using th PC algorithm.
Activating project at `~/projects/research/natalia_causality/code`
d = CSV.File("../data/data_Oct2023_new.csv", missingstring="NA") |> DataFrame |> dropmissing
d.Glottocode[d.Glottocode .== "kose1239"] .= "awiy1238"
filter(x -> x ∉ ["sout3004", "sout3270"], d.Glottocode)
insertcols!(d, :case => ["no case", "case"][1 .+ d.Case])
d| Row | ISO | Glottocode | Language | Family | Area | N_Speakers | Case | AP_Entropy | VerbFinal | VerbMiddle | case |
|---|---|---|---|---|---|---|---|---|---|---|---|
| String3 | String15 | String | String31 | String15 | Int64 | Int64 | Float64 | Float64 | Float64 | String | |
| 1 | aai | arif1239 | Arifama-Miniafia | Austronesian | Oceania | 3470 | 0 | 0.751032 | 0.817204 | 0.150538 | no case |
| 2 | aau | abau1245 | Abau | Sepik | Oceania | 7500 | 1 | 0.615254 | 0.956522 | 0.0217391 | case |
| 3 | abt | ambu1247 | Ambulas | Ndu | Oceania | 33000 | 1 | 0.847862 | 0.666667 | 0.294118 | case |
| 4 | aby | anem1248 | Aneme Wake | Yareban | Oceania | 650 | 1 | 0.842658 | 0.65625 | 0.229167 | case |
| 5 | acd | giky1238 | Gikyode | Atlantic-Congo | Africa | 10400 | 0 | 0.591673 | 0.190476 | 0.738095 | no case |
| 6 | ace | achi1257 | Acehnese | Austronesian | Eurasia | 2840000 | 0 | 0.75617 | 0.128713 | 0.752475 | no case |
| 7 | acn | acha1249 | Longchuan Achang | Sino-Tibetan | Eurasia | 27700 | 1 | 0.837306 | 0.883562 | 0.0958904 | case |
| 8 | acr | achi1256 | Achi | Mayan | North America | 124000 | 0 | 0.721928 | 0.2 | 0.685714 | no case |
| 9 | acu | achu1248 | Achuar-Shiwiar | Chicham | South America | 3520 | 1 | 0.637387 | 0.774194 | 0.16129 | case |
| 10 | ade | adel1244 | Adele | Atlantic-Congo | Africa | 26400 | 0 | 0.558629 | 0.118012 | 0.763975 | no case |
| 11 | adj | adio1239 | Adioukrou | Atlantic-Congo | Africa | 159000 | 0 | 0.799759 | 0.257143 | 0.585714 | no case |
| 12 | aey | amel1241 | Amele | Nuclear Trans-New-Guinea | Oceania | 5300 | 0 | 0.510093 | 0.793814 | 0.14433 | no case |
| 13 | afr | afri1274 | Afrikaans | Indo-European | Africa | 17160000 | 0 | 0.424134 | 0.740979 | 0.199742 | no case |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 682 | zpi | sant1451 | Santa María Quiegolani Zapotec | Otomanguean | North America | 2000 | 0 | 0.642938 | 0.0545455 | 0.2 | no case |
| 683 | zpl | lach1249 | Lachixío Zapotec | Otomanguean | North America | 6500 | 0 | 0.451623 | 0.0810811 | 0.472973 | no case |
| 684 | zpm | mixt1426 | Mixtepec Zapotec | Otomanguean | North America | 7000 | 0 | 0.591673 | 0.0952381 | 0.428571 | no case |
| 685 | zpo | amat1238 | Amatlán Zapotec | Otomanguean | North America | 10000 | 0 | 0.759276 | 0.0487805 | 0.536585 | no case |
| 686 | zpq | zoog1238 | Zoogocho Zapotec | Otomanguean | North America | 1000 | 0 | 0.867282 | 0.2 | 0.488889 | no case |
| 687 | zpu | yala1267 | Yalálag Zapotec | Otomanguean | North America | 3500 | 0 | 0.693127 | 0.116279 | 0.372093 | no case |
| 688 | zpv | chic1274 | Chichicapan Zapotec | Otomanguean | North America | 2720 | 0 | 0.911342 | 0.122449 | 0.408163 | no case |
| 689 | zpz | texm1235 | Texmelucan Zapotec | Otomanguean | North America | 4630 | 0 | 0.90117 | 0.121951 | 0.487805 | no case |
| 690 | zsr | sout3004 | Southern Rincon Zapotec | Otomanguean | North America | 12000 | 0 | 0.996134 | 0.0487805 | 0.341463 | no case |
| 691 | ztq | quio1241 | Quioquitani-Quieri Zapotec | Otomanguean | North America | 4000 | 0 | 0.77935 | 0.0989011 | 0.604396 | no case |
| 692 | zty | yate1242 | Yatee Zapotec | Otomanguean | North America | 5000 | 0 | 0.884115 | 0.186047 | 0.418605 | no case |
| 693 | zul | zulu1248 | Zulu | Atlantic-Congo | Africa | 27300000 | 0 | 0.503258 | 0.0666667 | 0.777778 | no case |
Since I have a posterior distribution over the diachronic changes, there are various ways to proceed.
The simplest way is to work with the posterior means
d_mean = DataFrame(
ap_entropy = mapslices(mean, Array(ap_entropy), dims=2) |> vec,
case = mapslices(mean, Array(case), dims=2) |> vec,
v1 = mapslices(mean, Array(v1), dims=2) |> vec,
vm = mapslices(mean, Array(vm), dims=2) |> vec,
vf = mapslices(mean, Array(vf), dims=2) |> vec
)
d_mean[1:10,:]| Row | ap_entropy | case | v1 | vm | vf |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | -0.0561222 | 0.146859 | 0.312116 | -0.502353 | 0.534095 |
| 2 | -0.691529 | 0.190721 | -0.442719 | 0.441969 | -0.227042 |
| 3 | -0.417357 | 0.726538 | -0.145272 | -0.0389503 | 0.276306 |
| 4 | 0.00392528 | 0.0203605 | 0.585015 | 0.443387 | -1.29164 |
| 5 | 0.00558466 | 0.0380693 | -0.496684 | -0.500937 | 1.32527 |
| 6 | 0.265089 | 0.688952 | 0.650022 | -0.442836 | 0.268705 |
| 7 | 0.060017 | 0.0344144 | 0.156385 | 0.0529664 | -0.305342 |
| 8 | 0.154984 | 0.00466574 | -0.355771 | 0.0205768 | 0.357639 |
| 9 | 0.904509 | 0.719974 | -0.514243 | 0.247453 | 0.411839 |
| 10 | 0.232673 | 0.0342125 | 0.00857667 | -0.202634 | 0.314142 |
How sensitive is the outcome to the significance level?
The results are partially contradictory, but the most likely explanation is that the data is not enough to make a clear inference.