Module Rimu/StatsTools
The module Rimu/StatsTools
contains helper function for analysis and post processing of Monte Carlo data.
Blocking analysis
After equilibration, FCIQMC produces information about observables through correlated time series. In order to estimate the statistical errors the time series need to be decorrelated. The main workhorse for achieving this is the blocking_analysis
, which is based on the paper by Flyvberg and Peterson JCP (1989), and automated with the M test of Jonsson PRE (2018).
Analysing the stochastic errors of observables obtained from the ratio of sample means is done with ratio_of_means
, where error propagation of correlated uncertainties is done with the help of the package MonteCarloMeasurements
.
Many convenience functions are implemented for directly analysing data obtained from lomc!
as a DataFrame
. See, e.g., shift_estimator
and projected_energy
. Asymptotically unbiased estimators are implemented as mixed_estimator
, growth_estimator
and rayleigh_replica_estimator
.
Exported
Rimu.StatsTools
— ModuleTools for the statistical analysis of Monte Carlo data.
Exports:
blocking_analysis
blocking_analysis_data
ratio_of_means
growth_witness
smoothen
shift_estimator
projected_energy
variational_energy_estimator
growth_estimator
growth_estimator_analysis
mixed_estimator
mixed_estimator_analysis
rayleigh_replica_estimator
rayleigh_replica_estimator_analysis
val_and_errs
val
mean_and_se
Rimu.StatsTools.blocking_analysis
— Methodblocking_analysis(v::AbstractVector; α = 0.01, corrected = true, skip=0, warn=true)
-> BlockingResult(mean, err, err_err, p_cov, k, blocks)
Compute the sample mean mean
and estimate the standard deviation of the mean (standard error) err
of a correlated time series. It uses the blocking algorithm from Flyvberg and Peterson JCP (1989) and the M test of Jonsson PRE (2018) at significance level $1-α$.
Use skip
to skip the first skip
elements in v
. corrected
controls whether bias correction for variances is used. If decorrelating the time series fails according to the M test, NaN
is returned as the standard error and -1
for k
. The keyword argument warn
controls whether a warning message is logged.
The summary result is returned as a BlockingResult
. k - 1
is the number of blocking transformations required to pass the hypothesis test for an uncorrelated time series and err_err
the estimated standard error or err
.
The detailed results from each reblocking step can be obtained with blocking_analysis_data
.
See BlockingResult
, shift_estimator
, ratio_of_means
, blocking_analysis_data
.
Rimu.StatsTools.blocking_analysis_data
— Methodblocking_analysis_data(v::AbstractVector; kwargs...) ->
(; br::BlockingResult, df::DataFrame)
Perform a blocking_analysis
and return the summary result br
as well as a DataFrame
df
with information about the standard error in each blocking step.
For a description of the keyword arguments see blocking_analysis
.
Example
julia> data = smoothen(rand(10_000), 2^6); # generate correlated data
julia> br, df = blocking_analysis_data(data)
(br = BlockingResult{Float64}
mean = 0.5088 ± 0.0029
with uncertainty of ± 0.00023454488294744232
from 78 blocks after 7 transformations (k = 8).
, df = 13×6 DataFrame
Row │ blocks mean std_err std_err_err p_cov mj
│ Int64 Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────────────────────
1 │ 10000 0.508806 0.000375044 2.6521e-6 1.40658e-7 9715.08
2 │ 5000 0.508806 0.000528547 5.28599e-6 2.79361e-7 4778.14
3 │ 2500 0.508806 0.000743386 1.05152e-5 5.52622e-7 2298.64
4 │ 1250 0.508806 0.00104064 2.08212e-5 1.08293e-6 1056.24
5 │ 625 0.508806 0.00144177 4.08121e-5 2.07871e-6 427.949
6 │ 312 0.508736 0.00194209 7.78707e-5 3.77171e-6 128.711
7 │ 156 0.508736 0.00247921 0.00014081 6.14647e-6 17.3075
8 │ 78 0.508736 0.00291063 0.000234545 8.47174e-6 0.731386
9 │ 39 0.508736 0.00284613 0.000326474 8.10046e-6 0.901054
10 │ 19 0.508241 0.0026998 0.000449967 7.28892e-6 2.85915
11 │ 9 0.507939 0.00359907 0.000899766 1.29533e-5 1.08644
12 │ 4 0.509327 0.00440559 0.00179857 1.94092e-5 0.0370381
13 │ 2 0.509327 0.00432708 0.00305971 1.87237e-5 0.125)
julia> using StatsPlots; unicodeplots();
julia> plot([br.k,br.k],[0.0,maximum(df.std_err.+df.std_err_err)], label="m test");
julia> @df df plot!(
1:length(:std_err), :std_err;
err=:std_err_err, xlabel="k", label="std err",
title="std err vs blocking steps"
)
⠀⠀⠀⠀⠀⠀⠀⠀⠀std err vs blocking steps⠀⠀⠀⠀⠀⠀⠀⠀
┌────────────────────────────────────────┐
0.00423501 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠀⠀⠀⠀│ m test
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠀⢸⠀⠀⠀⠀│ std err
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⢸⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⢸⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⡠⢺⠒⠒⢺⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⡀⠀⠀⡆⣀⠤⡗⠉⠀⢸⠀⠀⢸⡆⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡧⠤⠔⡗⠊⠉⡏⠀⠀⡇⠀⠀⢸⠀⠀⢸⢣⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⡇⠀⠀⠁⠀⠀⠁⠀⠀⠁⠀⠀⠀⠀⠀⢸⠸⡀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠴⠁⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠀⡇⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⣦⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⠀│
│⠀⠀⠀⢀⣀⠤⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀│
│⠀⠒⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀│
-0.00012335 │⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠧⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤│
└────────────────────────────────────────┘
⠀0.64⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀k⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀13.36⠀
A vertical line at k==8
indicates the blocking step identified by hypothesis testing to decorrelate the time series data. The decorrelation length can thus be estimated at $2^{k-1} = 2^7 = 128$. Note that the data was correlated with a sliding window of $2^6$ steps.
Rimu.StatsTools.mean_and_se
— Methodmean_and_se(v::AbstractVector; α = 0.01, corrected::Bool=true, skip=0) -> mean, err
mean_and_se(r::BlockingResult) -> mean, err
Return the mean and standard error (as a tuple) of a time series obtained from blocking_analysis
. See also BlockingResult
.
Statistics.cov
— Methodcov(r::BlockingResult{<:Complex})
Return the covariance matrix of the multivariate normal distribution approximating the uncertainty of the blocking result r
of a complex time series. See (https://en.wikipedia.org/wiki/Complexnormaldistribution).
Rimu.StatsTools.ratio_of_means
— Methodratio_of_means(num, denom; α=0.01, corrected=true, mc_samples=nothing, skip=0, warn=true)
-> r::RatioBlockingResult
Estimate the ratio of mean(num)/mean(denom)
assuming that num
and denom
are possibly correlated time series, skipping the first skip
elements. A blocking analysis with m-test is used to uncorrelate the time series, see blocking_analysis
. The remaining standard error and correlation of the means is propagated using MonteCarloMeasurements
. The results are reported as a RatioBlockingResult
.
Robust estimates for the ratio are obtained from pmedian(r)
and confidence intervals from pquantile()
, e.g. pquantile(r, [0.025, 0.975])
for the 95% confidence interval.
Estimates from linear uncertainty propagation are returned as r.f
and r.σ_f
using x_by_y_linear
. The standard error estimate r.σ_f
should only be trusted when the coefficient of variation std(denom)/mean(denom)
is small: abs(r.δ_y) < 0.1
. Under this condition can the ratio be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013)
The keyword mc_samples
controls the number of samples used for error propagation by MonteCarloMeasurements
. Use nothing
for the default and Val(1000)
to set the number to 1000 samples in a type-consistent way.
The keyword warn
controls whether warning messages are logged when blocking fails or noisy denominators are encountered.
Note: to compute statistics on the RatioBlockingResult
, use functions pmedian
, pquantile
, pmiddle
, piterate
, pextrema
, pminimum
, pmaximum
, pmean
, and pcov
.
Rimu.StatsTools.errs
— Methoderrs(x; n=1, p=nothing, name=:err) -> (; err_l, err_u)
Return the lower and upper error bar for the uncertain value x
.
See val_and_errs
.
Rimu.StatsTools.to_measurement
— Methodto_measurement(p::MonteCarloMeasurements.Particles) -> ::Measurements.measurement
Convert an uncertain number from MonteCarloMeasurements
to Measurements
format using the median as the central point. The new ±
boundaries will include the 68% quantile around the median.
Rimu.StatsTools.val
— Methodval(x)
Return the best estimate value for an uncertain x
. Defaults to the median
for uncertain x
represented by a (sampled) distribution. Supports MonteCarloMeasurements
and Measurements
.
Rimu.StatsTools.val_and_errs
— Methodval_and_errs(x; n=1, p=nothing, name=:val) -> (;val, val_l, val_u)
Return the median and the lower and upper error bar for the uncertain value x
as a NamedTuple
. This is useful for plotting scripts. The interval [val - val_l, val + val_u]
represents the confidence interval at level n*σ
, or at probability p
. Setting p
overrides n
. Supports MonteCarloMeasurements
and Measurements
. The names in the NamedTuple
can be changed with name
.
Example:
julia> results = [blocking_analysis(i:0.1:2i+20) for i in 1:3]; # mock results
julia> v = val_and_errs.(results, name="res"); # Vector of NamedTuple's with standard errors
julia> DataFrame(v)
3×3 DataFrame
Row │ res res_l res_u
│ Float64 Float64 Float64
─────┼───────────────────────────
1 │ 11.5 1.7282 1.7282
2 │ 13.0 1.7282 1.7282
3 │ 14.5 1.78885 1.78885
See NamedTuple
, val
, errs
, BlockingResult
, RatioBlockingResult
.
Rimu.StatsTools.growth_witness
— Functiongrowth_witness(df::DataFrame, [b]; shift=:shift, norm=:norm, dτ=df.dτ[end], skip=0)
Calculate the growth witness directly from a DataFrame
returned by lomc!
. The keyword arguments shift
and norm
can be used to change the names of the relevant columns.
Rimu.StatsTools.growth_witness
— Methodgrowth_witness(shift::AbstractArray, norm::AbstractArray, dt, [b]; skip=0) -> g
growth_witness(df::DataFrame, [b]; skip=0) -> g
Compute the growth witness
\[G^{(n)} = S^{(n)} - \frac{\vert\mathbf{c}^{(n+1)}\vert - \vert\mathbf{c}^{(n)}\vert}{\vert\mathbf{c}^{(n)}\vert d\tau},\]
where S
is the shift
and $\vert\mathbf{c}^{(n)}\vert ==$ norm[n, 1]
. Setting b ≥ 1
a sliding average over b
time steps is computed using smoothen()
. The first skip
time steps are skipped. mean(growth_witness)
is approximately the same as growth_estimator
with h=0
.
See also growth_estimator
.
Rimu.StatsTools.smoothen
— Methodsmoothen(noisy::AbstractVector, b)
Smoothen the array noisy
by averaging over a sliding window of length b
and wrapping noisy
periodically. The mean(noisy)
is preserved.
Rimu.StatsTools.growth_estimator
— Methodgrowth_estimator(
shift, wn, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
change_type = identity,
kwargs...
) -> r::RatioBlockingResult
growth_estimator(
df::DataFrame, h;
shift_name=:shift,
norm_name=:norm,
dτ=df.dτ[end],
kwargs...
) -> r::RatioBlockingResult
Compute the growth estimator with reference energy E_r
by the reweighting technique described in Umrigar et al. (1993), see Eq. (20). shift
and wn
are equal length vectors containing the shift and walker number time series, respectively. Reweighting is done over h
time steps and length(shift) - skip
time steps are used for the blocking analysis done with ratio_of_means()
. dτ
is the time step and weights
a function that calulates the weights. See w_exp()
and w_lin()
.
\[E_{gr} = E_r - \frac{1}{dτ}\ln \frac{\sum_n w_{h+1}^{(n+1)} N_\mathrm{w}^{(n+1)}} {\sum_m w_{h}^{(m)} N_\mathrm{w}^{(m)}}\]
When h
is greater than the autocorrelation time scale of the shift
, then E_gr
(returned as r.ratio
) is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the (biased) shift estimator. Error propagation is done with MonteCarloMeasurements
. Progagation through the logarithm can be modified by setting change_type
to to_measurement
in order to avoid NaN
results from negative outliers.
If success==true
the blocking analysis was successful in k-1
steps, using blocks
uncorrelated data points.
The second method calculates the growth estimator directly from a DataFrame
returned by lomc!
. The keyword arguments shift_name
and norm_name
can be used to change the names of the relevant columns.
See also mixed_estimator()
and RatioBlockingResult
.
Rimu.StatsTools.growth_estimator_analysis
— Methodgrowth_estimator_analysis(df::DataFrame; kwargs...)
-> (;df_ge, correlation_estimate, se, se_l, se_u)
Compute the growth_estimator
on a DataFrame
df
returned from lomc!
repeatedly over a range of reweighting depths.
Returns a NamedTuple
with the fields
df_ge
:DataFrame
with reweighting depth andgrowth_estiamator
data. See example below.correlation_estimate
: estimated correlation time from blocking analysisse, se_l, se_u
:shift_estimator
and error
Keyword arguments
h_range
: The default is abouth_values
values from 0 to twice the estimated correlation timeh_values = 100
: minimum number of reweighting depthsskip = 0
: initial time steps to exclude from averagingthreading = Threads.nthreads() > 1
: iffalse
a progress meter is displayedshift_name = :shift
name of column indf
with shift datanorm_name = :norm
name of column indf
with walkernumber datawarn = true
whether to log warning messages when blocking fails or denominators are small
Example
df, _ = lomc!(...)
df_ge, correlation_estimate, se, se_l, se_u = growth_estimator_analysis(df; skip=5_000)
using StatsPlots
@df df_ge plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_ge plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_gr") # growth estimator as a function of reweighting depth
xlabel!("h")
See also: growth_estimator
, mixed_estimator_analysis
.
Rimu.StatsTools.mixed_estimator
— Methodmixed_estimator(
hproj, vproj, shift, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
kwargs...
) -> r::RatioBlockingResult
mixed_estimator(
df::DataFrame, h;
hproj_name=:hproj,
vproj_name=:vproj,
shift_name=:shift,
dτ=df.dτ[end],
kwargs...
)
Compute the mixed estimator by the reweighting technique described in Umrigar et al. (1993), Eq. (19)
\[E_\mathrm{mix} = \frac{\sum_n w_{h}^{(n)} (Ĥ'\mathbf{v})⋅\mathbf{c}^{(n)}} {\sum_m w_{h}^{(m)} \mathbf{v}⋅\mathbf{c}^{(m)}} ,\]
where the time series hproj ==
$(Ĥ'\mathbf{v})⋅\mathbf{c}^{(n)}$ and vproj ==
$\mathbf{v}⋅\mathbf{c}^{(m)}$ have the same length as shift
(See ProjectedEnergy
on how to set these up). Reweighting is done over h
time steps and length(shift) - skip
time steps are used for the blocking analysis done with ratio_of_means()
. dτ
is the time step and weights
a function that calulates the weights. See w_exp()
and w_lin()
. Additional keyword arguments are passed on to ratio_of_means()
.
When h
is greater than the autocorrelation time scale of the shift
, then r.ratio
is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the unweighted ratio. Error propagation is done with MonteCarloMeasurements
. Results are returned as RatioBlockingResult
.
The second method calculates the mixed energy estimator directly from a DataFrame
returned by lomc!
. The keyword arguments hproj_name
, vproj_name
, and shift_name
can be used to change the names of the relevant columns.
See also growth_estimator()
.
Rimu.StatsTools.mixed_estimator_analysis
— Methodmixed_estimator_analysis(df::DataFrame; kwargs...)
-> (; df_me, correlation_estimate, se, se_l, se_u)
Compute the mixed_estimator
on a DataFrame
df
returned from lomc!
repeatedly over a range of reweighting depths.
Returns a NamedTuple
with the fields
df_me
:DataFrame
with reweighting depth andmixed_estiamator
data. See example below.correlation_estimate
: estimated correlation time from blocking analysisse, se_l, se_u
:shift_estimator
and error
Keyword arguments
h_range
: The default is abouth_values
values from 0 to twice the estimated correlation timeh_values = 100
: minimum number of reweighting depthsskip = 0
: initial time steps to exclude from averagingthreading = Threads.nthreads() > 1
: iffalse
a progress meter is displayedshift_name = :shift
name of column indf
with shift datahproj_name = :hproj
name of column indf
with operator overlap datavproj_name = :vproj
name of column indf
with projector overlap datawarn = true
whether to log warning messages when blocking fails or denominators are small
Example
df, _ = lomc!(...)
df_me, correlation_estimate, se, se_l, se_u = mixed_estimator_analysis(df; skip=5_000)
using StatsPlots
@df df_me plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_me plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_mix") # mixed estimator as a function of reweighting depth
xlabel!("h")
See also: mixed_estimator
, growth_estimator_analysis
.
Rimu.StatsTools.projected_energy
— Methodprojected_energy(
df::DataFrame;
skip=0, hproj=:hproj, vproj=:vproj, kwargs...
) -> r::RatioBlockingResult
Compute the projected energy estimator
\[E_\mathrm{p} = \frac{\sum_n \mathbf{v}⋅Ĥ\mathbf{c}^{(n)}} {\sum_m \mathbf{v}⋅\mathbf{c}^{(m)}} ,\]
where the time series df.hproj ==
$\mathbf{v}⋅Ĥ\mathbf{c}^{(n)}$ and df.vproj ==
$\mathbf{v}⋅\mathbf{c}^{(m)}$ are taken from df
, skipping the first skip
entries (use post_step =
ProjectedEnergy()
to set these up in lomc!()
). projected_energy
is equivalent to mixed_estimator
with h=0
.
The keyword arguments hproj
and vproj
can be used to change the names of the relevant columns. Other kwargs
are passed on to ratio_of_means
. Returns a RatioBlockingResult
.
See NamedTuple
, val_and_errs
, val
, errs
for processing results.
Rimu.StatsTools.rayleigh_replica_estimator
— Methodrayleigh_replica_estimator(
op_ol, vec_ol, shift, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
kwargs...
) -> r::RatioBlockingResult
rayleigh_replica_estimator(
df::DataFrame;
shift_name="shift",
op_name="Op1",
vec_name="dot",
h=0,
skip=0,
Anorm=1,
kwargs...
) -> r::RatioBlockingResult
Compute the estimator of a Rayleigh quotient of operator $\hat{A}$ with reweighting,
\[A_\mathrm{est}(h) = \frac{\sum_{a<b} \sum_n w_{h,a}^{(n)} w_{h,b}^{(n)} \mathbf{c}_a^{(n)} \cdot \hat{A} \cdot \mathbf{c}_b^{(n)}} {\sum_{a<b} \sum_n w_{h,a}^{(n)} w_{h,b}^{(n)} \mathbf{c}_a^{(n)} \cdot \mathbf{c}_b^{(n)}},\]
using data from multiple replicas.
Argument op_ol
holds data for the operator overlap $\mathbf{c}_a^{(n)} \hat{A} \mathbf{c}_b^{(n)}$ and vec_ol
holds data for the vector overlap $\mathbf{c}_a^{(n)} \mathbf{c}_b^{(n)}$. They are of type Vector{Vector}
, with each element Vector
holding the data for a pair of replicas. Argument shift
is of type Vector{Vector}
, with each element Vector
holding the shift data for each individual replica.
The second method computes the Rayleigh quotient directly from a DataFrame
returned by lomc!
. The keyword arguments shift_name
, op_name
and vec_name
can be used to change the names of the relevant columns, see AllOverlaps
for default formatting. The operator overlap data can be scaled by a prefactor Anorm
. A specific reweighting depth can be set with keyword argument h
. The default is h = 0
which calculates the Rayleigh quotient without reweighting.
The reweighting is an extension of the mixed estimator using the reweighting technique described in Umrigar et al. (1993). Reweighting is done over h
time steps and length(shift) - skip
time steps are used for the blocking analysis done with ratio_of_means()
. dτ
is the time step and weights
a function that calulates the weights. See w_exp()
and w_lin()
. Additional keyword arguments are passed on to ratio_of_means()
.
Error propagation is done with MonteCarloMeasurements
. Results are returned as RatioBlockingResult
.
See also mixed_estimator
, growth_estimator()
.
Rimu.StatsTools.rayleigh_replica_estimator_analysis
— Methodrayleigh_replica_estimator_analysis(df::DataFrame; kwargs...)
-> (; df_rre, df_se)
Compute the rayleigh_replica_estimator
on a DataFrame
df
returned from lomc!
repeatedly over a range of reweighting depths.
Returns a NamedTuple
with the fields
df_rre
:DataFrame
with reweighting depth andrayleigh_replica_estimator
data. See example below.df_se
:DataFrame
withshift_estimator
output, one row per replica
Keyword arguments
h_range
: The default is abouth_values
values from 0 to twice the estimated correlation timeh_values = 100
: minimum number of reweighting depthsskip = 0
: initial time steps to exclude from averagingthreading = Threads.nthreads() > 1
: iffalse
a progress meter is displayedshift_name = "shift"
: shift data corresponding to column indf
with names<shift>_1
, ...op_name = "Op1"
: name of operator overlap corresponding to column indf
with namesc1_<op_ol>_c2
, ...vec_name = "dot"
: name of vector-vector overlap corresponding to column indf
with namesc1_<vec_ol>_c2
, ...Anorm = 1
: a scalar prefactor to scale the operator overlap datawarn = true
: whether to log warning messages when blocking fails or denominators are small
Example
df, _ = lomc!(...)
df_rre, df_se = rayleigh_replica_estimator_analysis(df; skip=5_000)
using StatsPlots
@df df_rre plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_rre plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_mix") # Rayleigh quotient estimator as a function of reweighting depth
xlabel!("h")
See also: rayleigh_replica_estimator
, mixed_estimator_analysis
, AllOverlaps
.
Rimu.StatsTools.shift_estimator
— Methodshift_estimator(df::DataFrame; shift=:shift, kwargs...) -> r::BlockingResult
Return the shift estimator from the data in df.shift
. The keyword argument shift
can be used to change the name of the relevant column. Other keyword arguments are passed on to blocking_analysis
. Returns a BlockingResult
.
See also growth_estimator
, projected_energy
.
Rimu.StatsTools.w_exp
— Methodw_exp(shift, h, dτ; E_r = mean(shift), skip = 0)
Compute the weights for reweighting over h
time steps with reference energy E_r
from the exponential formula
\[w_h^{(n)} = \prod_{j=1}^h \exp[-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip
.
See also w_lin()
, growth_estimator()
, mixed_estimator()
.
Rimu.StatsTools.w_lin
— Methodw_lin(shift, h, dτ; E_r = mean(shift), skip = 0)
Compute the weights for reweighting over h
time steps with reference energy E_r
from the linearised formula
\[w_h^{(n)} = \prod_{j=1}^h [1-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip
.
See also w_exp()
, growth_estimator()
, mixed_estimator()
.
Rimu.StatsTools.replica_fidelity
— Methodreplica_fidelity(df::DataFrame; p_field = :hproj, skip = 0)
Compute the fidelity of the average coefficient vector and the projector defined in p_field
from the result of replica lomc!
passed as argument df
, using replicas _1
and _2
. Calls ratio_of_means()
to perform a blocking analysis on a ratio of the means of separate time series and returns a RatioBlockingResult
. The first skip
steps in the time series are skipped.
The fidelity of states |ψ⟩
and |ϕ⟩
is defined as
\[F(ψ,ϕ) = \frac{|⟨ψ|ϕ⟩|^2}{⟨ψ|ψ⟩⟨ϕ|ϕ⟩} .\]
Specifically, replica_fidelity
computes
\[F(\mathbf{v},⟨\mathbf{c}⟩) = \frac{⟨(\mathbf{c}_1⋅\mathbf{v})(\mathbf{v}⋅\mathbf{c}_1)⟩} {⟨\mathbf{c}_1⋅\mathbf{c}_1⟩} ,\]
where v
is the projector specified by p_field
, which is assumed to be normalised to unity with the two-norm (i.e. v⋅v == 1
), and $\mathbf{c}_1$ and $\mathbf{c}_2$ are two replica coefficient vectors.
Rimu.StatsTools.variational_energy_estimator
— Methodvariational_energy_estimator(shifts, overlaps; kwargs...)
variational_energy_estimator(df::DataFrame; max_replicas=:all, kwargs...)
-> r::RatioBlockingResult
Compute the variational energy estimator from the replica time series of the shifts
and coefficient vector overlaps
by blocking analysis. The keyword argument max_replicas
can be used to constrain the number of replicas processed to be smaller than all available in df
. Other keyword arguments are passed on to ratio_of_means()
. Returns a RatioBlockingResult
.
An estimator for the variational energy
\[\frac{⟨\mathbf{c}⟩^† \mathbf{H}⟨\mathbf{c}⟩}{⟨\mathbf{c}⟩^†⟨\mathbf{c}⟩}\]
is calculated from
\[Ē_{v} = \frac{\sum_{a<b}^R \overline{(S_a+S_b) \mathbf{c}_a^† \mathbf{c}_b}} {2\sum_{a<b}^R \overline{\mathbf{c}_a^† \mathbf{c}_b}} ,\]
where the sum goes over distinct pairs out of the $R$ replicas. See arXiv:2103.07800.
The DataFrame
version can extract the relevant information from the result of lomc!
. Set up lomc!
with the keyword argument replica = AllOverlaps(R)
and R ≥ 2
. If passing shifts
and overlaps
, the data has to be arranged in the correct order (as provided in the DataFrame
version).
See AllOverlaps
.
Additional docstrings
MonteCarloMeasurements.Particles
— MethodMonteCarloMeasurements.Particles(r::BlockingResult; mc_samples = 2000)
MonteCarloMeasurements.±(r::BlockingResult)
Convert a BlockingResult
into a Particles
object for nonlinear error propagation with MonteCarloMeasurements
.
Rimu.StatsTools.BlockingResult
— TypeBlockingResult(mean, err, err_err, p_cov, k, blocks)
Result of blocking_analysis
.
Fields:
mean
: sample meanerr
: standard error (estimated standard deviation of the mean)err_err
: estimated uncertainty oferr
p_cov
: estimated pseudo covariance ofmean
, relevant for complex time seriesk::Int
: k-1 blocking steps were used to uncorrelate time seriesblocks::Int
: number of uncorrelated values after blocking
Has methods for NamedTuple
, val_and_errs
, val
, errs
, mean_and_se
, Measurements.:±
, MonteCarloMeasurements.Particles
, and Statistics.cov
for Complex
data.
Example:
julia> blocking_analysis(smoothen(randn(2^10), 2^5))
BlockingResult{Float64}
mean = -0.026 ± 0.029
with uncertainty of ± 0.003638545517264226
from 32 blocks after 5 transformations (k = 6).
Measurements.measurement
— Methodmeasurement(r::BlockingResult)
Measurements.±(r::BlockingResult)
Convert a BlockingResult
into a Measurement
for linear error propagation with Measurements
.
Limitation: Does not account for covariance in complex BlockingResult
. Consider using MonteCarloMeasurements.Particles(r)
!
Rimu.StatsTools.blocker
— Methodblocker(v::Vector) -> new_v::Vector
Reblock the data by successively taking the mean of two adjacent data points to form a new vector with a half of the length(v)
. The last data point will be discarded if length(v)
is odd.
Rimu.StatsTools.blocks_with_m
— Methodblocks_with_m(v; corrected = true) -> (;blocks, mean, std_err, std_err_err, p_cov, mj)
Perform the blocking algorithm from Flyvberg and Peterson JCP (1989). Returns named tuple with the results from all blocking steps. See mtest()
.
Rimu.StatsTools.mtest
— Methodmtest(mj::AbstractVector; α = 0.01) -> k
mtest(table::NamedTuple; α = 0.01) -> k
Hypothesis test for decorrelation of a time series after blocking transformations with significance level $1-α$ after Jonson PRE (2018). mj
or table.mj
is expected to be a vector with relevant $M_j$ values from a blocking analysis as obtained from blocks_with_m()
. Returns the row number k
where the M-test is passed. If the M-test has failed mtest()
returns the value -1
.
Rimu.StatsTools.RatioBlockingResult
— TypeRatioBlockingResult(ratio, f, σ_f, δ_y, k, success)
Result of ratio_of_means()
.
Fields:
ratio::P
: ratio with uncertainties propagated byMonteCarloMeasurements
f::T
: ratio of meansσ_f::T
: std from linear propagationδ_y::T
: coefficient of variation for denominator (≤ 0.1 for normal approx)k::Int
: k-1 blocking steps were used to uncorrelate time seriesblocks::Int
: number of data values after blockingsuccess::Bool
: false if any of the blocking steps failed
Has methods for NamedTuple
, val_and_errs
, val
, errs
.
Note: to compute statistics on the RatioBlockingResult
, use functions pmedian
, pquantile
, pmiddle
, piterate
, pextrema
, pminimum
, pmaximum
, pmean
, and pcov
.
Rimu.StatsTools.particles
— Methodparticles(samples, μ, σ)
particles(samples, μ::AbstractVector, Σ::AbstractMatrix)
Return Particles
object from MonteCarloMeasurements
with single- or multivariate normal distribution. Zero variance parameters are supported.
Rimu.StatsTools.particles
— Methodparticles(samples, d)
particles(::Nothing, d)
particles(::Val{T}, d) where T
Return Particles
object from MonteCarloMeasurements
using a type-stable constructor if possible. Pass nothing
for the default number of particles or Val(1_000)
for using 1000 particles in a type-stable manner. If d
is a Particles
object it is passed through without re-sampling.
Rimu.StatsTools.ratio_estimators
— Methodratio_estimators(x, y, [k]; corrected=true, mc_samples=10_000) -> (; r, f, σ_f, δ_y, n)
Estimators for the ratio of means mean(x)/mean(y)
. If k
is given, k-1
blocking steps are performed to remove internal correlations in the time series x
and y
. Otherwise these are assumed to be free of internal correlations. Correlations between x
and y
may be present and are taken into account.
Return values:
r::Particles
is the Monte Carlo sampled ratio estimator, seeParticles
f = mean(x)/mean(y)
σ_f
standard deviation off
from linear error propagation (normal approximation)δ_y = std(y)/mean(y)
coefficient of variation; < 0.1 for normal approximation to workn
: number of uncorrelated data used for uncertainty estimation
Rimu.StatsTools.x_by_y_linear
— Methodx_by_y_linear(μ_x,μ_y,σ_x,σ_y,ρ) -> f, σ_f
Linear error propagation for ratio f = x/y
assuming x
and y
are correlated normal random variables and assuming the ratio can be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013).
\[σ_f = \sqrt{\frac{σ_x}{μ_y}^2 + \frac{μ_x σ_y}{μ_y^2}^2 - \frac{2 ρ μ_x}{μ_y^3}}\]
Core.NamedTuple
— MethodNamedTuple(x::BlockingResult; n=1, p=nothing, name=:val)
NamedTuple(x::RatioBlockingResult; n=1, p=nothing, name=:val)
Return a named tuple with value and error bars (see val_and_errs
) as well as additional numerical fields relevant for x
.
Example:
julia> results = [blocking_analysis(i:0.1:2i+20) for i in 1:3]; # mock results
julia> df = NamedTuple.(results, name=:res)|>DataFrame
3×7 DataFrame
Row │ res res_l res_u res_err_err res_p_cov res_k res_blocks
│ Float64 Float64 Float64 Float64 Float64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────
1 │ 11.5 1.7282 1.7282 0.352767 2.98667 5 13
2 │ 13.0 1.7282 1.7282 0.352767 2.98667 5 13
3 │ 14.5 1.78885 1.78885 0.350823 3.2 5 14
julia> rbs = ratio_of_means(1 .+sin.(1:0.1:11),2 .+sin.(2:0.1:12)); # more mock results
julia> [NamedTuple(rbs),]|>DataFrame
1×9 DataFrame
Row │ val val_l val_u val_f val_σ_f val_δ_y val_k val_blocks val_success
│ Float64 Float64 Float64 Float64 Float64 Float64 Int64 Int64 Bool
─────┼────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 0.581549 0.0925669 0.0812292 0.560532 0.0875548 0.0875548 4 12 true
See val_and_errs
, val
, errs
, BlockingResult
, RatioBlockingResult
.
Rimu.StatsTools.autocovariance
— Methodautocovariance(v::Vector,h::Int; corrected::Bool=true)
$\hat{\gamma}(h) =\frac{1}{n}\sum_{t=1}^{n-h}(v_{t+h}-\bar{v})(v_t-\bar{v})^*$ Calculate the autocovariance of dataset v
with a delay h
. If corrected
is true
(the default) then the sum is scaled with n-h
, whereas the sum is scaled with n
if corrected is false
where n = length(v)
.
Rimu.StatsTools.pseudo_cov
— Methodpseudo_cov(x, y; xmean = mean(x), ymean = mean(y), corrected = true)
Compute the pseudo covariance between collections x
and y
returning a scalar:
\[\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})(y_{i} - \bar{y})\]
Optionally, precomputed means can be passed as keyword arguments. pseudo_cov(x,y)
is functionally equivalent to Statistics.cov(x, conj(y); corrected = false)
but it is found to be significantly faster and avoids allocations.
Index
Rimu.StatsTools
Core.NamedTuple
MonteCarloMeasurements.Particles
Rimu.StatsTools.BlockingResult
Rimu.StatsTools.RatioBlockingResult
Measurements.measurement
Rimu.StatsTools.autocovariance
Rimu.StatsTools.blocker
Rimu.StatsTools.blocking_analysis
Rimu.StatsTools.blocking_analysis_data
Rimu.StatsTools.blocks_with_m
Rimu.StatsTools.errs
Rimu.StatsTools.growth_estimator
Rimu.StatsTools.growth_estimator_analysis
Rimu.StatsTools.growth_witness
Rimu.StatsTools.growth_witness
Rimu.StatsTools.mean_and_se
Rimu.StatsTools.mixed_estimator
Rimu.StatsTools.mixed_estimator_analysis
Rimu.StatsTools.mtest
Rimu.StatsTools.particles
Rimu.StatsTools.particles
Rimu.StatsTools.projected_energy
Rimu.StatsTools.pseudo_cov
Rimu.StatsTools.ratio_estimators
Rimu.StatsTools.ratio_of_means
Rimu.StatsTools.rayleigh_replica_estimator
Rimu.StatsTools.rayleigh_replica_estimator_analysis
Rimu.StatsTools.replica_fidelity
Rimu.StatsTools.shift_estimator
Rimu.StatsTools.smoothen
Rimu.StatsTools.to_measurement
Rimu.StatsTools.val
Rimu.StatsTools.val_and_errs
Rimu.StatsTools.variational_energy_estimator
Rimu.StatsTools.w_exp
Rimu.StatsTools.w_lin
Rimu.StatsTools.x_by_y_linear
Statistics.cov