Skip to content

Commit 01543f6

Browse files
committed
run precommit checks
1 parent c966c07 commit 01543f6

File tree

2 files changed

+2
-77
lines changed

2 files changed

+2
-77
lines changed

examples/gaussian_processes/GP-Heteroskedastic.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"\n",
6060
"from scipy.spatial.distance import pdist\n",
6161
"\n",
62-
"%config InlineBackend.figure_format ='retina'\n"
62+
"%config InlineBackend.figure_format ='retina'"
6363
]
6464
},
6565
{
@@ -1733,7 +1733,7 @@
17331733
],
17341734
"source": [
17351735
"%load_ext watermark\n",
1736-
"%watermark -n -u -v -iv -w -p xarray\n"
1736+
"%watermark -n -u -v -iv -w -p xarray"
17371737
]
17381738
},
17391739
{

examples/gaussian_processes/GP-Heteroskedastic.myst.md

Lines changed: 0 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,6 @@ This notebook will work through several approaches to heteroskedastic modeling w
3131

3232
## Data
3333

34-
+++
35-
3634
```{code-cell} ipython3
3735
import arviz as az
3836
import matplotlib.pyplot as plt
@@ -45,16 +43,12 @@ from scipy.spatial.distance import pdist
4543
%config InlineBackend.figure_format ='retina'
4644
```
4745

48-
+++
49-
5046
```{code-cell} ipython3
5147
SEED = 2020
5248
rng = np.random.default_rng(SEED)
5349
az.style.use("arviz-darkgrid")
5450
```
5551

56-
+++
57-
5852
```{code-cell} ipython3
5953
def signal(x):
6054
return x / 2 + np.sin(2 * np.pi * x) / 5
@@ -85,12 +79,8 @@ plt.plot(X, y, "C0o")
8579
plt.errorbar(X_, y, y_err, color="C0")
8680
```
8781

88-
+++
89-
9082
## Helper and plotting functions
9183

92-
+++
93-
9484
```{code-cell} ipython3
9585
def get_ℓ_prior(points):
9686
"""Calculates mean and sd for InverseGamma prior on lengthscale"""
@@ -106,8 +96,6 @@ def get_ℓ_prior(points):
10696
ℓ_μ, ℓ_σ = (stat for stat in get_ℓ_prior(X_))
10797
```
10898

109-
+++
110-
11199
```{code-cell} ipython3
112100
def plot_inducing_points(ax):
113101
yl = ax.get_ylim()
@@ -193,16 +181,12 @@ def plot_total(ax, mean_samples, var_samples=None, bootstrap=True, n_boots=100):
193181
ax.legend(loc="upper left")
194182
```
195183

196-
+++
197-
198184
## Homoskedastic GP
199185

200186
+++
201187

202188
First, let's fit a standard homoskedastic GP using PyMC's `Marginal Likelihood` implementation. Here and throughout this notebook we'll use an informative prior for length scale as suggested by [Michael Betancourt](https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html#4_adding_an_informative_prior_for_the_length_scale). We could use `pm.find_MAP()` and `.predict` for even faster inference and prediction, with similar results, but for direct comparison to the other models we'll use NUTS and `.conditional` instead, which run fast enough.
203189

204-
+++
205-
206190
```{code-cell} ipython3
207191
with pm.Model() as model_hm:
208192
ℓ = pm.InverseGamma("ℓ", mu=ℓ_μ, sigma=ℓ_σ)
@@ -223,8 +207,6 @@ with model_hm:
223207
samples_hm = pm.sample_posterior_predictive(trace_hm, var_names=["mu_pred_hm", "noisy_pred_hm"])
224208
```
225209

226-
+++
227-
228210
```{code-cell} ipython3
229211
_, axs = plt.subplots(1, 3, figsize=(18, 4))
230212
mu_samples = samples_hm.posterior_predictive["mu_pred_hm"].values.reshape(-1, len(Xnew))
@@ -234,8 +216,6 @@ plot_var(axs[1], noisy_samples.var(axis=0))
234216
plot_total(axs[2], noisy_samples)
235217
```
236218

237-
+++
238-
239219
Here we've plotted our understanding of the mean behavior with the corresponding epistemic uncertainty on the left, our understanding of the variance or aleatoric uncertainty in the middle, and integrated all sources of uncertainty on the right. This model captures the mean behavior well, but we can see that it overestimates the noise in the lower regime while underestimating the noise in the upper regime, as expected.
240220

241221
+++
@@ -246,8 +226,6 @@ Here we've plotted our understanding of the mean behavior with the corresponding
246226

247227
The simplest approach to modeling a heteroskedastic system is to fit a GP on the mean at each point along the domain and supply the standard deviation as weights.
248228

249-
+++
250-
251229
```{code-cell} ipython3
252230
with pm.Model() as model_wt:
253231
ℓ = pm.InverseGamma("ℓ", mu=ℓ_μ, sigma=ℓ_σ)
@@ -265,8 +243,6 @@ with model_wt:
265243
samples_wt = pm.sample_posterior_predictive(trace_wt, var_names=["mu_pred_wt"])
266244
```
267245

268-
+++
269-
270246
```{code-cell} ipython3
271247
_, axs = plt.subplots(1, 3, figsize=(18, 4))
272248
mu_samples = samples_wt.posterior_predictive["mu_pred_wt"].values.reshape(-1, len(Xnew))
@@ -276,8 +252,6 @@ plot_var(axs[1], mu_samples.var(axis=0))
276252
plot_total(axs[2], mu_samples)
277253
```
278254

279-
+++
280-
281255
This approach captured slightly more nuance in the overall uncertainty than the homoskedastic GP, but still underestimated the variance within both the observed regimes. Note that the variance displayed by this model is purely epistemic: our understanding of the mean behavior is weighted by the uncertainty in our observations, but we didn't include a component to account for aleatoric noise.
282256

283257
+++
@@ -288,8 +262,6 @@ This approach captured slightly more nuance in the overall uncertainty than the
288262

289263
Now let's model the mean and the log of the variance as separate GPs through PyMC's `Latent` implementation, feeding both into a `Normal` likelihood. Note that we add a small amount of diagonal noise to the individual covariances in order to stabilize them for inversion.
290264

291-
+++
292-
293265
```{code-cell} ipython3
294266
with pm.Model() as model_ht:
295267
ℓ = pm.InverseGamma("ℓ", mu=ℓ_μ, sigma=ℓ_σ)
@@ -317,8 +289,6 @@ with model_ht:
317289
samples_ht = pm.sample_posterior_predictive(trace_ht, var_names=["μ_pred_ht", "lg_σ_pred_ht"])
318290
```
319291

320-
+++
321-
322292
```{code-cell} ipython3
323293
_, axs = plt.subplots(1, 3, figsize=(18, 4))
324294
μ_samples = samples_ht.posterior_predictive["μ_pred_ht"].values.reshape(-1, len(Xnew))
@@ -328,8 +298,6 @@ plot_var(axs[1], σ_samples**2)
328298
plot_total(axs[2], μ_samples, σ_samples**2)
329299
```
330300

331-
+++
332-
333301
That looks much better! We've accurately captured the mean behavior of our system along with an understanding of the underlying trend in the variance, with appropriate uncertainty. Crucially, the aggregate behavior of the model integrates both epistemic *and* aleatoric uncertainty, and the ~5% of our observations that fall outside the 2σ band are more or less evenly distributed across the domain. However, that took *over two hours* to sample only 4k NUTS iterations. Due to the expense of the requisite matrix inversions, GPs are notoriously inefficient for large data sets. Let's reformulate this model using a sparse approximation.
334302

335303
+++
@@ -340,8 +308,6 @@ That looks much better! We've accurately captured the mean behavior of our syste
340308

341309
Sparse approximations to GPs use a small set of *inducing points* to condition the model, vastly improve the speed of inference and somewhat improve memory consumption. PyMC doesn't have an implementation for sparse latent GPs ([yet](https://github.com/pymc-devs/pymc/pull/2951)), but we can throw together our own real quick using Bill Engel's [DTC latent GP example](https://gist.github.com/bwengals/a0357d75d2083657a2eac85947381a44). These inducing points can be specified in a variety of ways, such as via the popular k-means initialization or even optimized as part of the model, but since our observations are evenly distributed we can make do with simply a subset of our unique input values.
342310

343-
+++
344-
345311
```{code-cell} ipython3
346312
class SparseLatent:
347313
def __init__(self, cov_func):
@@ -374,8 +340,6 @@ class SparseLatent:
374340
return mu_pred
375341
```
376342

377-
+++
378-
379343
```{code-cell} ipython3
380344
# Explicitly specify inducing points by downsampling our input vector
381345
Xu = X[1::2]
@@ -405,8 +369,6 @@ with model_hts:
405369
samples_hts = pm.sample_posterior_predictive(trace_hts, var_names=["μ_pred", "lg_σ_pred"])
406370
```
407371

408-
+++
409-
410372
```{code-cell} ipython3
411373
_, axs = plt.subplots(1, 3, figsize=(18, 4))
412374
μ_samples = samples_hts.posterior_predictive["μ_pred"].values.reshape(-1, len(Xnew))
@@ -419,8 +381,6 @@ plot_total(axs[2], μ_samples, σ_samples**2)
419381
plot_inducing_points(axs[2])
420382
```
421383

422-
+++
423-
424384
That was ~8x faster with nearly indistinguishable results, and fewer divergences as well.
425385

426386
+++
@@ -431,8 +391,6 @@ That was ~8x faster with nearly indistinguishable results, and fewer divergences
431391

432392
So far, we've modeled the mean and noise of our system as independent. However, there may be scenarios where we expect them to be correlated, for example if higher measurement values are expected to have greater noise. Here, we'll explicitly model this correlation through a covariance function that is a Kronecker product of the spatial kernel we've used previously and a `Coregion` kernel, as suggested by Bill Engel [here](https://discourse.pymc.io/t/coregionalization-model-for-two-separable-multidimensional-gaussian-process/2550/4). This is an implementation of the Linear Model of Coregionalization, which treats each correlated GP as a linear combination of a small number of independent basis functions, which are themselves GPs. We first add a categorical dimension to the domain of our observations to indicate whether the mean or variance is being considered, then unpack the respective components before feeding them into a `Normal` likelihood as above.
433393

434-
+++
435-
436394
```{code-cell} ipython3
437395
def add_coreg_idx(x):
438396
return np.hstack([np.tile(x, (2, 1)), np.vstack([np.zeros(x.shape), np.ones(x.shape)])])
@@ -468,8 +426,6 @@ with model_htsc:
468426
samples_htsc = pm.sample_posterior_predictive(trace_htsc, var_names=["c_mu_pred"])
469427
```
470428

471-
+++
472-
473429
```{code-cell} ipython3
474430
c_mu_pred = samples_htsc.posterior_predictive["c_mu_pred"].values
475431
c_mu_pred = c_mu_pred.reshape(-1, c_mu_pred.shape[-1])
@@ -487,19 +443,13 @@ plot_total(axs[2], μ_samples, σ_samples**2)
487443
plot_inducing_points(axs[2])
488444
```
489445

490-
+++
491-
492446
We can look at the learned correlation between the mean and variance by inspecting the covariance matrix $\bf{B}$ constructed via $\mathbf{B} \equiv \mathbf{WW}^T+diag(\kappa)$:
493447

494-
+++
495-
496448
```{code-cell} ipython3
497449
with model_htsc:
498450
B_samples = pm.sample_posterior_predictive(trace_htsc, var_names=["W", "kappa"])
499451
```
500452

501-
+++
502-
503453
```{code-cell} ipython3
504454
# Keep in mind that the first dimension in all arrays is the sampling dimension
505455
W = B_samples.posterior_predictive["W"].values
@@ -517,8 +467,6 @@ B = WW_T + diag_kappa
517467
B.mean(axis=0)
518468
```
519469

520-
+++
521-
522470
```{code-cell} ipython3
523471
sd = np.sqrt(np.diagonal(B, axis1=1, axis2=2))
524472
outer_sd = np.einsum("ij,ik->ijk", sd, sd)
@@ -528,8 +476,6 @@ print(f"Median correlation: {np.percentile(correlation,50,axis=0)[0,1]:0.3f}")
528476
print(f"97.5%ile correlation: {np.percentile(correlation,97.5,axis=0)[0,1]:0.3f}")
529477
```
530478

531-
+++
532-
533479
The model has inadvertently learned that the mean and noise are negatively correlated, albeit with a wide credible interval.
534480

535481
+++
@@ -544,8 +490,6 @@ The three latent approaches shown here varied in their complexity and efficiency
544490

545491
### Regression surfaces
546492

547-
+++
548-
549493
```{code-cell} ipython3
550494
_, axs = plt.subplots(1, 3, figsize=(18, 4))
551495
@@ -578,38 +522,24 @@ axs[0].legend().remove()
578522
axs[1].legend().remove()
579523
```
580524

581-
+++
582-
583525
### Latent model convergence
584526

585-
+++
586-
587527
```{code-cell} ipython3
588528
display(az.summary(trace_ht).sort_values("ess_bulk").iloc[:5])
589529
```
590530

591-
+++
592-
593531
### Sparse Latent model convergence
594532

595-
+++
596-
597533
```{code-cell} ipython3
598534
display(az.summary(trace_hts).sort_values("ess_bulk").iloc[:5])
599535
```
600536

601-
+++
602-
603537
### Correlated Sparse Latent model convergence
604538

605-
+++
606-
607539
```{code-cell} ipython3
608540
display(az.summary(trace_htsc).sort_values("ess_bulk").iloc[:5])
609541
```
610542

611-
+++
612-
613543
## Authors
614544
- This notebook was written by John Goertz on 5 May, 2021.
615545
- Updated by Christopher Krapu on December 21, 2025 for v5 compatibility.
@@ -625,15 +555,10 @@ display(az.summary(trace_htsc).sort_values("ess_bulk").iloc[:5])
625555

626556
## Watermark
627557

628-
+++
629-
630558
```{code-cell} ipython3
631559
%load_ext watermark
632560
%watermark -n -u -v -iv -w -p xarray
633561
```
634562

635-
+++
636-
637563
:::{include} ../page_footer.md
638564
:::
639-

0 commit comments

Comments
 (0)