[ Densities with Frequency ]
The two plots below convey roughly the same information.
The one on the left is a histogram where the Y axis represents frequencies (i.e. how often we see values within the range associated with each bin).
The one on the right is a density (KDE estimation). The Y axis represents density (the integral should add up to one).
I typically prefer the density plot (you can adjust the but the Y axis is usually harder to interpret.
I know that KDE estimation would return a density that adds up to 1 assuming that the domain of the variable spans from -Inf to Inf, but is there any way to map the PDF resulting from KDE estimation to frequencies (e.g. scaling up the values to have frequencies)?
Is it just a matter of "scaling" of the axis? Or is there anything else involved?
Answer 1
You'll have to calculate density points first, and then plot. Read http://scikit-learn.org/stable/modules/density.html. Some code:
from sklearn.neighbors.kde import KernelDensity
import numpy as np
import matplotlib.pyplot as plt
# This X is your data for the histogram
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
X = X.reshape(-1, 1)
kde = KernelDensity(kernel='gaussian', bandwidth=1).fit(X)
x = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
density = np.exp(kde.score_samples(x))
plt.plot(x, density)
plt.show()