Module Rimu/StatsTools
The module Rimu/StatsTools
contains helper function for analysis and post processing of Monte Carlo data.
Usage
The module is not exported by default. In oder to use its functions without explicitly specifying the submodule, import the module with
using Rimu, Rimu.StatsTools
Blocking analysis
For blocking analysis of a single time series use blocking_analysis
, or mean_and_se
.
For blocking analysis of a couple of time series where the ration of means is the quantity of interest use ratio_of_means
.
Exported
Rimu.StatsTools
— ModuleTools for the statistical analysis of Monte Carlo data.
Exports:
Rimu.StatsTools.blocking_analysis
— Methodblocking_analysis(v::AbstractVector; α = 0.01, corrected = true)
-> BlockingResult(mean, err, err_err, p_cov, k, blocks)
Compute the sample mean mean
and estimate the standard deviation of the mean (standard error) err
of a correlated time series using the blocking algorithm from Flyvberg and Peterson JCP (1989) and the M test of Jonsson PRE (2018) at significance level $1-α$. k
is the number of blocking transformations required to pass the hypothesis test for an uncorrelated time series and err_err
the estimated standard error or err
.
If decorrelating the time series fails according to the M test, NaN
is returned as the standard error and -1
for k
. corrected
controls whether bias correction for variances is used.
See BlockingResult
.
Rimu.StatsTools.mean_and_se
— Methodmean_and_se(v::AbstractVector; α = 0.01, corrected::Bool=true) -> mean, err
mean_and_se(r::BlockingResult) -> mean, err
Return the mean and standard error (as a tuple) of a time series obtained from blocking_analysis
. See also BlockingResult
.
Rimu.StatsTools.ratio_of_means
— Methodratio_of_means(num, denom; α = 0.01, corrected = true, mc_samples = 10_000) -> r
Estimate the ratio of mean(num)/mean(denom)
assuming that num
and denom
are possibly correlated time series. A blocking analysis with m-test is used to uncorrelate the time series, see blocking_analysis()
. The remaining standard error and correlation of the means is propagated using MonteCarloMeasurements
. The results are reported as a RatioBlockingResult
.
Robust estimates for the ratio are obtained from median(r)
and confidence intervals from quantile()
, e.g. quantile(r, [0.025, 0.975])
for the 95% confidence interval.
Estimates from linear uncertainty propagation are returned as r.f
and r.σ_f
using x_by_y_linear()
.
Rimu.StatsTools.med_and_errs
— Methodmed_and_errs(p) -> (; med, err1_l, err1_u, err2_l, err2_u)
Convenience function for extracting plottable data from a distribution or an uncertain object created by the packages MonteCarloMeasurements
or Measurements
. Returns the median med
and the lower err1_l
and upper err1_u
standard error (for 1σ or 68% confidence inteval). err2_l
and err2_u
provide the lower and upper error limits for the 2σ or 95% confidence interval.
Example:
julia> results = [Particles(d) for d in datasets] # Particles[]
julia> res_w_errs = med_and_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×5 DataFrame
Row │ med err1_l err1_u err2_l err2_u
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 1.01325 0.0173805 0.0183057 0.034042 0.0366713
Rimu.StatsTools.ratio_with_errs
— Methodratio_with_errs(r::RatioBlockingResult)
-> (;ratio=med, err1_l, err1_u, err2_l, err2_u, f, σ_f, δ_y, k, success)
Convenience function for extracting plottable data from RatioBlockingResult
. Returns NamedTuple
with median and standard error of r
extracted by p_to_errs()
. See also ratio_of_means()
.
Example:
julia> results = [ratio_of_means(n[i], d[i]; args...) for i in datasets]
julia> res_w_errs = ratio_with_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×10 DataFrame
Row │ ratio err1_l err1_u err2_l err2_u f σ_f δ_y k success
│ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Int64 Bool
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1.01325 0.0173805 0.0183057 0.034042 0.0366713 1.01361 0.0181869 0.0128806 2 true
Rimu.StatsTools.replica_fidelity
— Methodreplica_fidelity(rr::Tuple; p_field = :hproj, skip = 0)
Compute the fidelity of the average coefficient vector and the projector defined in p_field
from the result of replica fci_qmc!()
passed as argument rr
(a tuple of DataFrames). Calls ratio_of_means()
to perform a blocking analysis on a ratio of the means of separate time series and returns a RatioBlockingResult
. The first skip
steps in the time series are skipped.
The fidelity of states |ψ⟩
and |ϕ⟩
is defined as
\[F(ψ,ϕ) = \frac{|⟨ψ|ϕ⟩|^2}{⟨ψ|ψ⟩⟨ϕ|ϕ⟩} .\]
Specifically, replica_fidelity
computes
\[F(\mathbf{v},⟨\mathbf{c}⟩) = \frac{⟨(\mathbf{c}_A⋅\mathbf{v})(\mathbf{v}⋅\mathbf{c}_B)⟩} {⟨\mathbf{c}_A⋅\mathbf{c}_B⟩} ,\]
where v
is the projector specified by p_field
, which is assumed to be normalised to unity with the two-norm (i.e. v⋅v == 1
), and $\mathbf{c}_A$ and $\mathbf{c}_B$ are two replica coefficient vectors.
Rimu.StatsTools.to_measurement
— Methodto_measurement(p::MonteCarloMeasurements.Particles) -> ::Measurements.measurement
Convert an uncertain number from MonteCarloMeasurements
to Measurements
format using the median as the central point. The new ±
boundaries will include the 68% quantile around the median.
Rimu.StatsTools.growth_witness
— Methodgrowth_witness(norm::AbstractArray, shift::AbstractArray, dt, [b]) -> g
growth_witness(df::DataFrame, [b]) -> g
Compute the growth witness
\[G^{(n)} = S^{(n)} - \frac{\vert\mathbf{c}^{(n+1)}\vert - \vert\mathbf{c}^{(n)}\vert}{\vert\mathbf{c}^{(n)}\vert d\tau},\]
where S
is the shift
and $\vert\mathbf{c}^{(n)}\vert ==$ norm[n, 1]
. Setting b ≥ 1
a sliding average over b
time steps is computed using smoothen()
.
Rimu.StatsTools.smoothen
— Methodsmoothen(noisy::AbstractVector, b)
Smoothen the array noisy
by averaging over a sliding window of length b
and wrapping noisy
periodically. The mean(noisy)
is preserved.
Rimu.StatsTools.growth_estimator
— Methodgrowth_estimator(
shift, wn, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
change_type = identity,
kwargs...,
) -> (; E_gr::Particles, k, blocks, success)
growth_estimator(df::DataFrame, h; kwargs ...)
Compute the growth estimator with reference energy E_r
by the reweighting technique described in Umirgar et al. (1993), see Eq. (20). shift
and wn
are equal length vectors containing the shift and walker number time series, respectively. Reweighting is done over h
time steps and length(shift) - skip
time steps are used for the blocking analysis done with ratio_of_means()
. dτ
is the time step and weights
a function that calulates the weights. See w_exp()
and w_lin()
.
\[E_{gr} = E_r - \frac{1}{dτ}\ln \frac{\sum_n w_{h+1}^{(n+1)} N_\mathrm{w}^{(n+1)}} {\sum_m w_{h}^{(m)} N_\mathrm{w}^{(m)}}\]
When h
is greater than the autocorrelation time scale of the shift
, then E_gr
is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the (biased) shift estimator. Error propagation is done with MonteCarloMeasurements
. Progagation through the logarithm can be modified by setting change_type
to to_measurement
in order to avoid NaN
results from negative outliers.
If success==true
the blocking analysis was successful in k-1
steps, using blocks
uncorrelated data points.
See also mixed_estimator()
.
Rimu.StatsTools.mixed_estimator
— Methodmixed_estimator(
hproj, vproj, shift, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
kwargs...,
) -> r::RatioBlockingResult
mixed_estimator(df::DataFrame, h; kwargs...)
Compute the mixed estimator by the reweighting technique described in Umirgar et al. (1993), Eq. (19)
\[E_\mathrm{mix} = \frac{\sum_n w_{h}^{(n)} (Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}} {\sum_m w_{h}^{(m)} \mathrm{v}⋅\mathrm{c}^{(m)}} ,\]
where the time series hproj ==
$(Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}$ and vproj ==
$\mathrm{v}⋅\mathrm{c}^{(m)}$ have the same length as shift
(See ReportingStrategy
on how to set these up). Reweighting is done over h
time steps and length(shift) - skip
time steps are used for the blocking analysis done with ratio_of_means()
. dτ
is the time step and weights
a function that calulates the weights. See w_exp()
and w_lin()
. Additional keyword arguments are passed on to ratio_of_means()
.
When h
is greater than the autocorrelation time scale of the shift
, then r.ratio
is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the unweighted ratio. Error propagation is done with MonteCarloMeasurements
. Results are returned as RatioBlockingResult
.
See also growth_estimator()
.
Rimu.StatsTools.w_exp
— Methodw_exp(shift, h, dτ; E_r = mean(shift), skip = 0)
Compute the weights for reweighting over h
time steps with reference energy E_r
from the exponential formula
\[w_h^{(n)} = \prod_{j=1}^h \exp[-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip
.
See also w_lin()
, growth_estimator()
, mixed_estimator()
.
Rimu.StatsTools.w_lin
— Methodw_lin(shift, h, dτ; E_r = mean(shift), skip = 0)
Compute the weights for reweighting over h
time steps with reference energy E_r
from the linearised formula
\[w_h^{(n)} = \prod_{j=1}^h [1-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip
.
See also w_exp()
, growth_estimator()
, mixed_estimator()
.
Additional docstrings
MonteCarloMeasurements.Particles
— MethodMonteCarloMeasurements.Particles(r::BlockingResult; mc_samples = 2000)
MonteCarloMeasurements.±(r::BlockingResult)
Convert a BlockingResult
into a Particles
object for nonlinear error propagation with MonteCarloMeasurements
.
Rimu.StatsTools.BlockingResult
— TypeBlockingResult(mean, err, err_err, p_cov, k, blocks)
Result of blocking_analysis()
.
Fields:
mean
: sample meanerr
: standard error (estimated standard deviation of the mean)err_err
: estimated uncertainty oferr
p_cov
: estimated pseudo covariance ofmean
, relevant for complex time seriesk::Int
: k-1 blocking steps were used to uncorrelate time seriesblocks::Int
: number of uncorrelated values after blocking
Has methods for mean_and_se
, Measurements.:±
, MonteCarloMeasurements.
Particles
, and Statistics.
cov
for Complex
data.
Measurements.measurement
— Methodmeasurement(r::BlockingResult)
Measurements.±(r::BlockingResult)
Convert a BlockingResult
into a Measurement
for linear error propagation with Measurements
.
Limitation: Does not account for covariance in complex BlockingResult
. Consider using MonteCarloMeasurements.Particles(r)
!
Rimu.StatsTools.blocker
— Methodblocker(v::Vector) -> new_v::Vector
Reblock the data by successively taking the mean of two adjacent data points to form a new vector with a half of the length(v)
. The last data point will be discarded if length(v)
is odd.
Rimu.StatsTools.blocks_with_m
— Methodblocks_with_m(v; corrected = true) -> (;blocks, mean, std_err, std_err_err, p_cov, mj)
Perform the blocking algorithm from Flyvberg and Peterson JCP (1989). Returns named tuple with the results from all blocking steps. See mtest()
.
Rimu.StatsTools.mtest
— Methodmtest(mj::AbstractVector; α = 0.01, warn = true) -> k
mtest(table; α = 0.01, warn = true) -> k
Hypothesis test for decorrelation of a time series after blocking transformations with significance level $1-α$ after Jonson PRE (2018). mj
or table.mj
is expected to be a vector with relevant $M_j$ values from a blocking analysis as obtained from blocks_with_m()
. Returns the row number k
where the M-test is passed. If the M-test has failed mtest()
returns the value -1
and optionally prints a warning message.
Statistics.cov
— Methodcov(r::BlockingResult{<:Complex})
Return the covariance matrix of the multivariate normal distribution approximating the uncertainty of the blocking result r
of a complex time series. See (https://en.wikipedia.org/wiki/Complexnormaldistribution).
Rimu.StatsTools.RatioBlockingResult
— TypeRatioBlockingResult(ratio, f, σ_f, δ_y, k, success)
Result of ratio_of_means()
.
Fields:
ratio::P
: ratio with uncertainties propagated by MonteCarloMeasurementsf::T
: ratio of meansσ_f::T
: std from linear propagationδ_y::T
: coefficient of variation for denominator (≤ 0.1 for normal approx)k::Int
: k-1 blocking steps were used to uncorrelate time seriesblocks::Int
: number of data values after blockingsuccess::Bool
: false if any of the blocking steps failed
Rimu.StatsTools.ratio_estimators
— Methodratio_estimators(x, y, [k]; corrected=true, mc_samples=10_000) -> (; r, f, σ_f, δ_y, n)
Estimators for the ratio of means mean(x)/mean(y)
. If k
is given, k-1
blocking steps are performed to remove internal correlations in the time series x
and y
. Otherwise these are assumed to be free of internal correlations. Correlations between x
and y
may be present and are taken into account.
Return values:
r::Particles
is the Monte Carlo sampled ratio estimator, seeParticles
f = mean(x)/mean(y)
σ_f
standard deviation off
from linear error propagation (normal approximation)δ_y = std(y)/mean(y)
coefficient of variation; < 0.1 for normal approximation to workn
: number of uncorrelated data used for uncertainty estimation
Rimu.StatsTools.x_by_y_linear
— Methodx_by_y_linear(μ_x,μ_y,σ_x,σ_y,ρ) -> f, σ_f
Linear error propagation for ratio f = x/y
assuming x
and y
are correlated normal random variables and assuming the ratio can be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013).
\[σ_f = \sqrt{\frac{σ_x}{μ_y}^2 + \frac{μ_x σ_y}{μ_y^2}^2 - \frac{2 ρ μ_x}{μ_y^3}}\]
Rimu.StatsTools.autocovariance
— Methodautocovariance(v::Vector,h::Int; corrected::Bool=true)
$\hat{\gamma}(h) =\frac{1}{n}\sum_{t=1}^{n-h}(v_{t+h}-\bar{v})(v_t-\bar{v})^*$ Calculate the autocovariance of dataset v
with a delay h
. If corrected
is true
(the default) then the sum is scaled with n-h
, whereas the sum is scaled with n
if corrected is false
where n = length(v)
.
Rimu.StatsTools.pseudo_cov
— Methodpseudo_cov(x, y; xmean = mean(x), ymean = mean(y), corrected = true)
Compute the pseudo covariance between collections x
and y
returning a scalar:
\[\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})(y_{i} - \bar{y})\]
Optionally, precomputed means can be passed as keyword arguments. pseudo_cov(x,y)
is functionally equivalent to Statistics.cov(x, conj(y); corrected = false)
but it is found to be significantly faster and avoids allocations.
Index
Rimu.StatsTools
MonteCarloMeasurements.Particles
Rimu.StatsTools.BlockingResult
Rimu.StatsTools.RatioBlockingResult
Measurements.measurement
Rimu.StatsTools.autocovariance
Rimu.StatsTools.blocker
Rimu.StatsTools.blocking_analysis
Rimu.StatsTools.blocks_with_m
Rimu.StatsTools.growth_estimator
Rimu.StatsTools.growth_witness
Rimu.StatsTools.mean_and_se
Rimu.StatsTools.med_and_errs
Rimu.StatsTools.mixed_estimator
Rimu.StatsTools.mtest
Rimu.StatsTools.pseudo_cov
Rimu.StatsTools.ratio_estimators
Rimu.StatsTools.ratio_of_means
Rimu.StatsTools.ratio_with_errs
Rimu.StatsTools.replica_fidelity
Rimu.StatsTools.smoothen
Rimu.StatsTools.to_measurement
Rimu.StatsTools.w_exp
Rimu.StatsTools.w_lin
Rimu.StatsTools.x_by_y_linear
Statistics.cov