A Visual Exploration of Gaussian Processes
If you've ever used "Bayesian optimization" to choose hyperparameters, that was almost definitely a Gaussian process.
The article focused on the case where you have a finite number of test points, which is probably a good idea for an article like this. Still, there is another interpretation of Gaussian processes where they are actual stochastic processes (hence the name), a probability distribution over a set of functions.
I would have found an article that covered that interpretation even more helpful, although I'm not sure an easy-to-follow version could exist.
The point that combining kernels lets you compose arbitrarily complex functions could come in the beginning. That's sort of why GPs are so exciting in the first place.
In particular, there's a kernel we could call a "change point," which is a way to have a totally different model fitted before a point in time versus after. It's used frequently in Automatic Statistician fitted models (https://automaticstatistician.com/examples/) which in my opinion are the state of the art of what you can use GPs for. They also developed a "LISP" like representation of the GP kernels, which lets them sample functions, fit them, and publish the simplest ones.
You can see an example of, "Create GP functions and try fitting them" here: https://github.com/probcomp/notebook/blob/master/tutorials/e... . Near the end of the notebook, you can see the "source code" of the fitted GP function. Note this implementation supports change points, but does not happen to need them on the sample data.
But generally, I wonder in which applications GP shines.
On the one hand, with its emphasis on time series data, GP has a lot to offer to finance, especially options pricing. On the other hand, GP boils down to "the near future looks a lot like the near past," which most people already know.
Clearly, what we want to know is: when will change points occur? Whoever cracks that nut has found GP its breakthrough application.
It should be clear from the example that some form of fitting over the generation of "Gaussian Process Programs" is a good first step.
I'm sorry to be a little off-topic here, but can someone please tell me how this guy got those math equations looking like that? They look like the common Latex font, but they're neither SVG nor PNG output of Latex as I first thought. How did he manage to get this??
I like the visualizations for showing how different kernels work, but I tend to prefer my explanation of GPs:
https://planspace.org/20181226-gaussian_processes_are_not_so...
Would love to get more feedback as well!
For X (your test data) and Y (your training data), we have our prior: X ~ N(0, <kernel>) and Y ~ N(<training values>, <identity>). For the joint distribution I can accept that the covariance will have the same kernel function (albeit on |X| + |Y| dimensions) but what would the mean be?
Not much visual though is it ;)
Is there supposed to be a Distill banner on this website?
"The mean of this probability distribution then represents the most probable characterization of the data."
Shouldn't it be the mode?
Off the topic. Does anyone know which chart library is being used here?