cd(@__DIR__)
using Pkg
Pkg.activate(".")
Pkg.instantiate()
Activating project at `~/projects/research/natalia_causality/code`
Now that the distribution of the latent variables are estimated, I’ll try my hand with causal inference, using th PC algorithm.
Activating project at `~/projects/research/natalia_causality/code`
using DataFrames
using CSV
using StatsBase
using StatsPlots
using Pipe
using CausalInference
using Turing
using Turing: Variational
using Flux
using TikzGraphs
using GLM
using Distributions
using LinearAlgebra
using MCMCChains
using Base.Threads
using BenchmarkTools
using Random
plotlyjs()
The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.
Plots.PlotlyJSBackend()
Since I have a posterior distribution over the diachronic changes, there are various ways to proceed.
The simplest way is to work with the posterior means
d_mean = DataFrame(
ap_entropy = mapslices(mean, Array(ap_entropy), dims=2) |> vec,
case = mapslices(mean, Array(case), dims=2) |> vec,
v1 = mapslices(mean, Array(v1), dims=2) |> vec,
vm = mapslices(mean, Array(vm), dims=2) |> vec,
vf = mapslices(mean, Array(vf), dims=2) |> vec
)
d_mean[1:10,:]
Row | ap_entropy | case | v1 | vm | vf |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | -0.015836 | 0.240188 | 0.259927 | -0.50958 | 0.46933 |
2 | -0.587081 | 0.241987 | -0.529696 | 0.434387 | -0.287941 |
3 | -0.745881 | 0.79275 | -0.117156 | -0.00798073 | 0.321854 |
4 | 0.0607523 | 0.0247049 | 0.610228 | 0.397553 | -1.26498 |
5 | -0.0294164 | 0.0593561 | -0.475766 | -0.544955 | 1.32272 |
6 | 0.436713 | 0.792322 | 0.659979 | -0.419645 | 0.28013 |
7 | 0.127586 | 0.0475166 | 0.180979 | 0.0649245 | -0.281607 |
8 | 0.120782 | -0.0183261 | -0.442596 | 0.00519606 | 0.355897 |
9 | 1.45712 | 0.777806 | -0.55214 | 0.228028 | 0.407297 |
10 | 0.125889 | 0.0882851 | 0.0224346 | -0.186445 | 0.264206 |
Alternatively, we can use the entire posterior sample.
d_full = DataFrame(
ap_entropy = ap_entropy |> Array |> vec,
case = case |> Array |> vec,
v1 = v1 |> Array |> vec,
vm = vm |> Array |> vec,
vf = vf |> Array |> vec
)
d_full[1:10, :]
Row | ap_entropy | case | v1 | vm | vf |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | -1.07893 | -0.526389 | -0.510786 | -0.535136 | -0.603544 |
2 | -0.718301 | 0.513604 | -0.8577 | 1.56809 | -0.896703 |
3 | -0.943592 | 0.534588 | -0.595396 | -1.24935 | 0.552006 |
4 | 2.36515 | -0.12337 | 0.133571 | 0.391609 | -1.21302 |
5 | 0.128232 | 1.07472 | -0.304634 | 0.0636439 | 3.14768 |
6 | 0.441998 | 0.99882 | 0.767479 | -0.330984 | 0.179163 |
7 | 0.199286 | -1.56847 | -0.761146 | 1.19747 | 0.0330949 |
8 | 2.11311 | -0.517772 | 0.0161979 | -1.00219 | -0.375763 |
9 | 1.19245 | 1.46839 | -1.13957 | -1.12433 | -0.124208 |
10 | 0.0502473 | -1.5388 | 2.00285 | -0.184938 | -0.349137 |
How sensitive is the outcome to the significance level?
The results differ wildly. This confirms my general skepticism about Pearl-style causal inference in such contexts.
The PC algorithm assumes:
Therefore I suspect that Pearl-style causal inference simply cannot be used in typology. We have to look at other approaches, such as Granger causality or structural equation modeling.