Module `Rimu/StatsTools`

The module Rimu/StatsTools contains helper function for analysis and post processing of Monte Carlo data.

Blocking analysis

After equilibration, FCIQMC produces information about observables through correlated time series. In order to estimate the statistical errors the time series need to be decorrelated. The main workhorse for achieving this is the blocking_analysis, which is based on the paper by Flyvberg and Peterson JCP (1989), and automated with the M test of Jonsson PRE (2018).

Analysing the stochastic errors of observables obtained from the ratio of sample means is done with ratio_of_means, where error propagation of correlated uncertainties is done with the help of the package MonteCarloMeasurements.

Many convenience functions are implemented for directly analysing data obtained from lomc! as a DataFrame. See, e.g., shift_estimator and projected_energy. Asymptotically unbiased estimators are implemented as mixed_estimator, growth_estimator and rayleigh_replica_estimator.

Exported

Rimu.StatsTools — Module

Tools for the statistical analysis of Monte Carlo data.

Exports:

Rimu.StatsTools.blocking_analysis — Method

blocking_analysis(v::AbstractVector; α = 0.01, corrected = true, skip=0, warn=true)
-> BlockingResult(mean, err, err_err, p_cov, k, blocks)

Compute the sample mean mean and estimate the standard deviation of the mean (standard error) err of a correlated time series. It uses the blocking algorithm from Flyvberg and Peterson JCP (1989) and the M test of Jonsson PRE (2018) at significance level $1-α$.

Use skip to skip the first skip elements in v. corrected controls whether bias correction for variances is used. If decorrelating the time series fails according to the M test, NaN is returned as the standard error and -1 for k. The keyword argument warn controls whether a warning message is logged.

The summary result is returned as a BlockingResult. k - 1 is the number of blocking transformations required to pass the hypothesis test for an uncorrelated time series and err_err the estimated standard error or err.

The detailed results from each reblocking step can be obtained with blocking_analysis_data.

See BlockingResult, shift_estimator, ratio_of_means, blocking_analysis_data.

Rimu.StatsTools.blocking_analysis_data — Method

blocking_analysis_data(v::AbstractVector; kwargs...) ->
(; br::BlockingResult, df::DataFrame)

Perform a blocking_analysis and return the summary result br as well as a DataFrame df with information about the standard error in each blocking step.

For a description of the keyword arguments see blocking_analysis.

Example

julia> data = smoothen(rand(10_000), 2^6); # generate correlated data

julia> br, df = blocking_analysis_data(data)
(br = BlockingResult{Float64}
  mean = 0.5088 ± 0.0029
  with uncertainty of ± 0.00023454488294744232
  from 78 blocks after 7 transformations (k = 8).
, df = 13×6 DataFrame
 Row │ blocks  mean      std_err      std_err_err  p_cov       mj
     │ Int64   Float64   Float64      Float64      Float64     Float64
─────┼──────────────────────────────────────────────────────────────────────
   1 │  10000  0.508806  0.000375044  2.6521e-6    1.40658e-7  9715.08
   2 │   5000  0.508806  0.000528547  5.28599e-6   2.79361e-7  4778.14
   3 │   2500  0.508806  0.000743386  1.05152e-5   5.52622e-7  2298.64
   4 │   1250  0.508806  0.00104064   2.08212e-5   1.08293e-6  1056.24
   5 │    625  0.508806  0.00144177   4.08121e-5   2.07871e-6   427.949
   6 │    312  0.508736  0.00194209   7.78707e-5   3.77171e-6   128.711
   7 │    156  0.508736  0.00247921   0.00014081   6.14647e-6    17.3075
   8 │     78  0.508736  0.00291063   0.000234545  8.47174e-6     0.731386
   9 │     39  0.508736  0.00284613   0.000326474  8.10046e-6     0.901054
  10 │     19  0.508241  0.0026998    0.000449967  7.28892e-6     2.85915
  11 │      9  0.507939  0.00359907   0.000899766  1.29533e-5     1.08644
  12 │      4  0.509327  0.00440559   0.00179857   1.94092e-5     0.0370381
  13 │      2  0.509327  0.00432708   0.00305971   1.87237e-5     0.125)

julia> using StatsPlots; unicodeplots();

julia> plot([br.k,br.k],[0.0,maximum(df.std_err.+df.std_err_err)], label="m test");

julia> @df df plot!(
           1:length(:std_err), :std_err;
           err=:std_err_err, xlabel="k", label="std err",
           title="std err vs blocking steps"
       )
               ⠀⠀⠀⠀⠀⠀⠀⠀⠀std err vs blocking steps⠀⠀⠀⠀⠀⠀⠀⠀
               ┌────────────────────────────────────────┐
    0.00423501 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠀⠀⠀⠀│ m test
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠀⢸⠀⠀⠀⠀│ std err
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⢸⠀⠀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⢸⠀⠀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⡠⢺⠒⠒⢺⠀⠀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⡀⠀⠀⡆⣀⠤⡗⠉⠀⢸⠀⠀⢸⡆⠀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡧⠤⠔⡗⠊⠉⡏⠀⠀⡇⠀⠀⢸⠀⠀⢸⢣⠀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⡇⠀⠀⠁⠀⠀⠁⠀⠀⠁⠀⠀⠀⠀⠀⢸⠸⡀⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠴⠁⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⠀⡇⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⣦⠀│
               │⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⠀│
               │⠀⠀⠀⢀⣀⠤⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀│
               │⠀⠒⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀│
   -0.00012335 │⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠧⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤│
               └────────────────────────────────────────┘
               ⠀0.64⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀k⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀13.36⠀

A vertical line at k==8 indicates the blocking step identified by hypothesis testing to decorrelate the time series data. The decorrelation length can thus be estimated at $2^{k-1} = 2^7 = 128$. Note that the data was correlated with a sliding window of $2^6$ steps.

See blocking_analysis, BlockingResult.

Rimu.StatsTools.mean_and_se — Method

mean_and_se(v::AbstractVector; α = 0.01, corrected::Bool=true, skip=0) -> mean, err
mean_and_se(r::BlockingResult) -> mean, err

Return the mean and standard error (as a tuple) of a time series obtained from blocking_analysis. See also BlockingResult.

Statistics.cov — Method

cov(r::BlockingResult{<:Complex})

Return the covariance matrix of the multivariate normal distribution approximating the uncertainty of the blocking result r of a complex time series. See (https://en.wikipedia.org/wiki/Complexnormaldistribution).

Rimu.StatsTools.ratio_of_means — Method

ratio_of_means(num, denom; α=0.01, corrected=true, mc_samples=nothing, skip=0, warn=true)
-> r::RatioBlockingResult

Estimate the ratio of mean(num)/mean(denom) assuming that num and denom are possibly correlated time series, skipping the first skip elements. A blocking analysis with m-test is used to uncorrelate the time series, see blocking_analysis. The remaining standard error and correlation of the means is propagated using MonteCarloMeasurements. The results are reported as a RatioBlockingResult.

Robust estimates for the ratio are obtained from pmedian(r) and confidence intervals from pquantile(), e.g. pquantile(r, [0.025, 0.975]) for the 95% confidence interval.

Estimates from linear uncertainty propagation are returned as r.f and r.σ_f using x_by_y_linear. The standard error estimate r.σ_f should only be trusted when the coefficient of variation std(denom)/mean(denom) is small: abs(r.δ_y) < 0.1. Under this condition can the ratio be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013)

The keyword mc_samples controls the number of samples used for error propagation by MonteCarloMeasurements. Use nothing for the default and Val(1000) to set the number to 1000 samples in a type-consistent way.

The keyword warn controls whether warning messages are logged when blocking fails or noisy denominators are encountered.

Note: to compute statistics on the RatioBlockingResult, use functions pmedian, pquantile, pmiddle, piterate, pextrema, pminimum, pmaximum, pmean, and pcov.

Rimu.StatsTools.errs — Method

errs(x; n=1, p=nothing, name=:err) -> (; err_l, err_u)

Return the lower and upper error bar for the uncertain value x.

See val_and_errs.

Rimu.StatsTools.to_measurement — Method

to_measurement(p::MonteCarloMeasurements.Particles) -> ::Measurements.measurement

Convert an uncertain number from MonteCarloMeasurements to Measurements format using the median as the central point. The new ± boundaries will include the 68% quantile around the median.

Rimu.StatsTools.val — Method

val(x)

Return the best estimate value for an uncertain x. Defaults to the median for uncertain x represented by a (sampled) distribution. Supports MonteCarloMeasurements and Measurements.

See errs, BlockingResult, RatioBlockingResult.

Rimu.StatsTools.val_and_errs — Method

val_and_errs(x; n=1, p=nothing, name=:val) -> (;val, val_l, val_u)

Return the median and the lower and upper error bar for the uncertain value x as a NamedTuple. This is useful for plotting scripts. The interval [val - val_l, val + val_u] represents the confidence interval at level n*σ, or at probability p. Setting p overrides n. Supports MonteCarloMeasurements and Measurements. The names in the NamedTuple can be changed with name.

Example:

julia> results = [blocking_analysis(i:0.1:2i+20) for i in 1:3]; # mock results

julia> v = val_and_errs.(results, name="res"); # Vector of NamedTuple's with standard errors

julia> DataFrame(v)
3×3 DataFrame
 Row │ res      res_l    res_u
     │ Float64  Float64  Float64
─────┼───────────────────────────
   1 │    11.5  1.7282   1.7282
   2 │    13.0  1.7282   1.7282
   3 │    14.5  1.78885  1.78885

See NamedTuple, val, errs, BlockingResult, RatioBlockingResult.

Rimu.StatsTools.growth_witness — Function

growth_witness(df::DataFrame, [b]; shift=:shift, norm=:norm, dτ=df.dτ[end], skip=0)

Calculate the growth witness directly from a DataFrame returned by lomc!. The keyword arguments shift and norm can be used to change the names of the relevant columns.

Rimu.StatsTools.growth_witness — Method

growth_witness(shift::AbstractArray, norm::AbstractArray, dt, [b]; skip=0) -> g
growth_witness(df::DataFrame, [b]; skip=0) -> g

Compute the growth witness

\[G^{(n)} = S^{(n)} - \frac{\vert\mathbf{c}^{(n+1)}\vert - \vert\mathbf{c}^{(n)}\vert}{\vert\mathbf{c}^{(n)}\vert d\tau},\]

where S is the shift and $\vert\mathbf{c}^{(n)}\vert ==$ norm[n, 1]. Setting b ≥ 1 a sliding average over b time steps is computed using smoothen(). The first skip time steps are skipped. mean(growth_witness) is approximately the same as growth_estimator with h=0.

See also growth_estimator.

Rimu.StatsTools.smoothen — Method

smoothen(noisy::AbstractVector, b)

Smoothen the array noisy by averaging over a sliding window of length b and wrapping noisy periodically. The mean(noisy) is preserved.

Rimu.StatsTools.growth_estimator — Method

growth_estimator(
    shift, wn, h, dτ;
    skip = 0,
    E_r = mean(shift[skip+1:end]),
    weights = w_exp,
    change_type = identity,
    kwargs...
) -> r::RatioBlockingResult
growth_estimator(
    df::DataFrame, h; 
    shift_name=:shift, 
    norm_name=:norm, 
    dτ=df.dτ[end], 
    kwargs...
) -> r::RatioBlockingResult

Compute the growth estimator with reference energy E_r by the reweighting technique described in Umrigar et al. (1993), see Eq. (20). shift and wn are equal length vectors containing the shift and walker number time series, respectively. Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). dτ is the time step and weights a function that calulates the weights. See w_exp() and w_lin().

\[E_{gr} = E_r - \frac{1}{dτ}\ln \frac{\sum_n w_{h+1}^{(n+1)} N_\mathrm{w}^{(n+1)}} {\sum_m w_{h}^{(m)} N_\mathrm{w}^{(m)}}\]

When h is greater than the autocorrelation time scale of the shift, then E_gr (returned as r.ratio) is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the (biased) shift estimator. Error propagation is done with MonteCarloMeasurements. Progagation through the logarithm can be modified by setting change_type to to_measurement in order to avoid NaN results from negative outliers.

If success==true the blocking analysis was successful in k-1 steps, using blocks uncorrelated data points.

The second method calculates the growth estimator directly from a DataFrame returned by lomc!. The keyword arguments shift_name and norm_name can be used to change the names of the relevant columns.

See also mixed_estimator() and RatioBlockingResult.

Rimu.StatsTools.growth_estimator_analysis — Method

growth_estimator_analysis(df::DataFrame; kwargs...)
-> (;df_ge, correlation_estimate, se, se_l, se_u)

Compute the growth_estimator on a DataFrame df returned from lomc! repeatedly over a range of reweighting depths.

Returns a NamedTuple with the fields

df_ge: DataFrame with reweighting depth and growth_estiamator data. See example below.
correlation_estimate: estimated correlation time from blocking analysis
se, se_l, se_u: shift_estimator and error

Keyword arguments

h_range: The default is about h_values values from 0 to twice the estimated correlation time
h_values = 100: minimum number of reweighting depths
skip = 0: initial time steps to exclude from averaging
threading = Threads.nthreads() > 1: if false a progress meter is displayed
shift_name = :shift name of column in df with shift data
norm_name = :norm name of column in df with walkernumber data
warn = true whether to log warning messages when blocking fails or denominators are small

Example

df, _ = lomc!(...)
df_ge, correlation_estimate, se, se_l, se_u = growth_estimator_analysis(df; skip=5_000)

using StatsPlots
@df df_ge plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_ge plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_gr") # growth estimator as a function of reweighting depth
xlabel!("h")

See also: growth_estimator, mixed_estimator_analysis.

Rimu.StatsTools.mixed_estimator — Method

mixed_estimator(
    hproj, vproj, shift, h, dτ;
    skip = 0,
    E_r = mean(shift[skip+1:end]),
    weights = w_exp,
    kwargs...
) -> r::RatioBlockingResult
mixed_estimator(
    df::DataFrame, h;
    hproj_name=:hproj, 
    vproj_name=:vproj, 
    shift_name=:shift, 
    dτ=df.dτ[end], 
    kwargs...
)

Compute the mixed estimator by the reweighting technique described in Umrigar et al. (1993), Eq. (19)

\[E_\mathrm{mix} = \frac{\sum_n w_{h}^{(n)} (Ĥ'\mathbf{v})⋅\mathbf{c}^{(n)}} {\sum_m w_{h}^{(m)} \mathbf{v}⋅\mathbf{c}^{(m)}} ,\]

where the time series hproj == $(Ĥ'\mathbf{v})⋅\mathbf{c}^{(n)}$ and vproj == $\mathbf{v}⋅\mathbf{c}^{(m)}$ have the same length as shift (See ProjectedEnergy on how to set these up). Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). dτ is the time step and weights a function that calulates the weights. See w_exp() and w_lin(). Additional keyword arguments are passed on to ratio_of_means().

When h is greater than the autocorrelation time scale of the shift, then r.ratio is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the unweighted ratio. Error propagation is done with MonteCarloMeasurements. Results are returned as RatioBlockingResult.

The second method calculates the mixed energy estimator directly from a DataFrame returned by lomc!. The keyword arguments hproj_name, vproj_name, and shift_name can be used to change the names of the relevant columns.

See also growth_estimator().

Rimu.StatsTools.mixed_estimator_analysis — Method

mixed_estimator_analysis(df::DataFrame; kwargs...)
-> (; df_me, correlation_estimate, se, se_l, se_u)

Compute the mixed_estimator on a DataFrame df returned from lomc! repeatedly over a range of reweighting depths.

Returns a NamedTuple with the fields

df_me: DataFrame with reweighting depth and mixed_estiamator data. See example below.
correlation_estimate: estimated correlation time from blocking analysis
se, se_l, se_u: shift_estimator and error

Keyword arguments

h_range: The default is about h_values values from 0 to twice the estimated correlation time
h_values = 100: minimum number of reweighting depths
skip = 0: initial time steps to exclude from averaging
threading = Threads.nthreads() > 1: if false a progress meter is displayed
shift_name = :shift name of column in df with shift data
hproj_name = :hproj name of column in df with operator overlap data
vproj_name = :vproj name of column in df with projector overlap data
warn = true whether to log warning messages when blocking fails or denominators are small

Example

df, _ = lomc!(...)
df_me, correlation_estimate, se, se_l, se_u = mixed_estimator_analysis(df; skip=5_000)

using StatsPlots
@df df_me plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_me plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_mix") # mixed estimator as a function of reweighting depth
xlabel!("h")

See also: mixed_estimator, growth_estimator_analysis.

Rimu.StatsTools.projected_energy — Method

projected_energy(
    df::DataFrame;
    skip=0, hproj=:hproj, vproj=:vproj, kwargs...
) -> r::RatioBlockingResult

Compute the projected energy estimator

\[E_\mathrm{p} = \frac{\sum_n \mathbf{v}⋅Ĥ\mathbf{c}^{(n)}} {\sum_m \mathbf{v}⋅\mathbf{c}^{(m)}} ,\]

where the time series df.hproj == $\mathbf{v}⋅Ĥ\mathbf{c}^{(n)}$ and df.vproj == $\mathbf{v}⋅\mathbf{c}^{(m)}$ are taken from df, skipping the first skip entries (use post_step =ProjectedEnergy() to set these up in lomc!()). projected_energy is equivalent to mixed_estimator with h=0.

The keyword arguments hproj and vproj can be used to change the names of the relevant columns. Other kwargs are passed on to ratio_of_means. Returns a RatioBlockingResult.

See NamedTuple, val_and_errs, val, errs for processing results.

Rimu.StatsTools.rayleigh_replica_estimator — Method

rayleigh_replica_estimator(
    op_ol, vec_ol, shift, h, dτ;
    skip = 0,
    E_r = mean(shift[skip+1:end]),
    weights = w_exp,
    kwargs...
) -> r::RatioBlockingResult
rayleigh_replica_estimator(
    df::DataFrame;
    shift_name="shift",
    op_name="Op1", 
    vec_name="dot", 
    h=0,
    skip=0, 
    Anorm=1,
    kwargs...
) -> r::RatioBlockingResult

Compute the estimator of a Rayleigh quotient of operator $\hat{A}$ with reweighting,

\[A_\mathrm{est}(h) = \frac{\sum_{a<b} \sum_n w_{h,a}^{(n)} w_{h,b}^{(n)} \mathbf{c}_a^{(n)} \cdot \hat{A} \cdot \mathbf{c}_b^{(n)}} {\sum_{a<b} \sum_n w_{h,a}^{(n)} w_{h,b}^{(n)} \mathbf{c}_a^{(n)} \cdot \mathbf{c}_b^{(n)}},\]

using data from multiple replicas.

Argument op_ol holds data for the operator overlap $\mathbf{c}_a^{(n)} \hat{A} \mathbf{c}_b^{(n)}$ and vec_ol holds data for the vector overlap $\mathbf{c}_a^{(n)} \mathbf{c}_b^{(n)}$. They are of type Vector{Vector}, with each element Vector holding the data for a pair of replicas. Argument shift is of type Vector{Vector}, with each element Vector holding the shift data for each individual replica.

The second method computes the Rayleigh quotient directly from a DataFrame returned by lomc!. The keyword arguments shift_name, op_name and vec_name can be used to change the names of the relevant columns, see AllOverlaps for default formatting. The operator overlap data can be scaled by a prefactor Anorm. A specific reweighting depth can be set with keyword argument h. The default is h = 0 which calculates the Rayleigh quotient without reweighting.

The reweighting is an extension of the mixed estimator using the reweighting technique described in Umrigar et al. (1993). Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). dτ is the time step and weights a function that calulates the weights. See w_exp() and w_lin(). Additional keyword arguments are passed on to ratio_of_means().

Error propagation is done with MonteCarloMeasurements. Results are returned as RatioBlockingResult.

See also mixed_estimator, growth_estimator().

Rimu.StatsTools.rayleigh_replica_estimator_analysis — Method

rayleigh_replica_estimator_analysis(df::DataFrame; kwargs...)
-> (; df_rre, df_se)

Compute the rayleigh_replica_estimator on a DataFrame df returned from lomc! repeatedly over a range of reweighting depths.

Returns a NamedTuple with the fields

df_rre: DataFrame with reweighting depth and rayleigh_replica_estimator data. See example below.
df_se: DataFrame with shift_estimator output, one row per replica

Keyword arguments

h_range: The default is about h_values values from 0 to twice the estimated correlation time
h_values = 100: minimum number of reweighting depths
skip = 0: initial time steps to exclude from averaging
threading = Threads.nthreads() > 1: if false a progress meter is displayed
shift_name = "shift": shift data corresponding to column in df with names <shift>_1, ...
op_name = "Op1": name of operator overlap corresponding to column in df with names c1_<op_ol>_c2, ...
vec_name = "dot": name of vector-vector overlap corresponding to column in df with names c1_<vec_ol>_c2, ...
Anorm = 1: a scalar prefactor to scale the operator overlap data
warn = true: whether to log warning messages when blocking fails or denominators are small

Example

df, _ = lomc!(...)
df_rre, df_se = rayleigh_replica_estimator_analysis(df; skip=5_000)

using StatsPlots
@df df_rre plot(_ -> se, :h, ribbon = (se_l, se_u), label = "⟨S⟩") # constant line and ribbon for shift estimator
@df df_rre plot!(:h, :val, ribbon = (:val_l, :val_u), label="E_mix") # Rayleigh quotient estimator as a function of reweighting depth
xlabel!("h")

See also: rayleigh_replica_estimator, mixed_estimator_analysis, AllOverlaps.

Rimu.StatsTools.shift_estimator — Method

shift_estimator(df::DataFrame; shift=:shift, kwargs...) -> r::BlockingResult

Return the shift estimator from the data in df.shift. The keyword argument shift can be used to change the name of the relevant column. Other keyword arguments are passed on to blocking_analysis. Returns a BlockingResult.

See also growth_estimator, projected_energy.

Rimu.StatsTools.w_exp — Method

w_exp(shift, h, dτ; E_r = mean(shift), skip = 0)

Compute the weights for reweighting over h time steps with reference energy E_r from the exponential formula

\[w_h^{(n)} = \prod_{j=1}^h \exp[-dτ(S^{(q+n-j)}-E_r)] ,\]

where q = skip.

See also w_lin(), growth_estimator(), mixed_estimator().

Rimu.StatsTools.w_lin — Method

w_lin(shift, h, dτ; E_r = mean(shift), skip = 0)

Compute the weights for reweighting over h time steps with reference energy E_r from the linearised formula

\[w_h^{(n)} = \prod_{j=1}^h [1-dτ(S^{(q+n-j)}-E_r)] ,\]

where q = skip.

See also w_exp(), growth_estimator(), mixed_estimator().

Rimu.StatsTools.replica_fidelity — Method

replica_fidelity(df::DataFrame; p_field = :hproj, skip = 0)

Compute the fidelity of the average coefficient vector and the projector defined in p_field from the result of replica lomc! passed as argument df, using replicas _1 and _2. Calls ratio_of_means() to perform a blocking analysis on a ratio of the means of separate time series and returns a RatioBlockingResult. The first skip steps in the time series are skipped.

The fidelity of states |ψ⟩ and |ϕ⟩ is defined as

\[F(ψ,ϕ) = \frac{|⟨ψ|ϕ⟩|^2}{⟨ψ|ψ⟩⟨ϕ|ϕ⟩} .\]

Specifically, replica_fidelity computes

\[F(\mathbf{v},⟨\mathbf{c}⟩) = \frac{⟨(\mathbf{c}_1⋅\mathbf{v})(\mathbf{v}⋅\mathbf{c}_1)⟩} {⟨\mathbf{c}_1⋅\mathbf{c}_1⟩} ,\]

where v is the projector specified by p_field, which is assumed to be normalised to unity with the two-norm (i.e. v⋅v == 1), and $\mathbf{c}_1$ and $\mathbf{c}_2$ are two replica coefficient vectors.

Rimu.StatsTools.variational_energy_estimator — Method

variational_energy_estimator(shifts, overlaps; kwargs...)
variational_energy_estimator(df::DataFrame; max_replicas=:all, kwargs...)
-> r::RatioBlockingResult

Compute the variational energy estimator from the replica time series of the shifts and coefficient vector overlaps by blocking analysis. The keyword argument max_replicas can be used to constrain the number of replicas processed to be smaller than all available in df. Other keyword arguments are passed on to ratio_of_means(). Returns a RatioBlockingResult.

An estimator for the variational energy

\[\frac{⟨\mathbf{c}⟩^† \mathbf{H}⟨\mathbf{c}⟩}{⟨\mathbf{c}⟩^†⟨\mathbf{c}⟩}\]

is calculated from

\[Ē_{v} = \frac{\sum_{a<b}^R \overline{(S_a+S_b) \mathbf{c}_a^† \mathbf{c}_b}} {2\sum_{a<b}^R \overline{\mathbf{c}_a^† \mathbf{c}_b}} ,\]

where the sum goes over distinct pairs out of the $R$ replicas. See arXiv:2103.07800.

The DataFrame version can extract the relevant information from the result of lomc!. Set up lomc! with the keyword argument replica = AllOverlaps(R) and R ≥ 2. If passing shifts and overlaps, the data has to be arranged in the correct order (as provided in the DataFrame version).

See AllOverlaps.

Additional docstrings

MonteCarloMeasurements.Particles — Method

MonteCarloMeasurements.Particles(r::BlockingResult; mc_samples = 2000)
MonteCarloMeasurements.±(r::BlockingResult)

Convert a BlockingResult into a Particles object for nonlinear error propagation with MonteCarloMeasurements.

Rimu.StatsTools.BlockingResult — Type

BlockingResult(mean, err, err_err, p_cov, k, blocks)

Result of blocking_analysis.

Fields:

mean: sample mean
err: standard error (estimated standard deviation of the mean)
err_err: estimated uncertainty of err
p_cov: estimated pseudo covariance of mean, relevant for complex time series
k::Int: k-1 blocking steps were used to uncorrelate time series
blocks::Int: number of uncorrelated values after blocking

Has methods for NamedTuple, val_and_errs, val, errs, mean_and_se, Measurements.:±, MonteCarloMeasurements.Particles, and Statistics.cov for Complex data.

Example:

julia> blocking_analysis(smoothen(randn(2^10), 2^5))
BlockingResult{Float64}
  mean = -0.026 ± 0.029
  with uncertainty of ± 0.003638545517264226
  from 32 blocks after 5 transformations (k = 6).

Measurements.measurement — Method

measurement(r::BlockingResult)
Measurements.±(r::BlockingResult)

Convert a BlockingResult into a Measurement for linear error propagation with Measurements.

Limitation: Does not account for covariance in complex BlockingResult. Consider using MonteCarloMeasurements.Particles(r)!

Rimu.StatsTools.blocker — Method

blocker(v::Vector) -> new_v::Vector

Reblock the data by successively taking the mean of two adjacent data points to form a new vector with a half of the length(v). The last data point will be discarded if length(v) is odd.

Rimu.StatsTools.blocks_with_m — Method

blocks_with_m(v; corrected = true) -> (;blocks, mean, std_err, std_err_err, p_cov, mj)

Perform the blocking algorithm from Flyvberg and Peterson JCP (1989). Returns named tuple with the results from all blocking steps. See mtest().

Rimu.StatsTools.mtest — Method

mtest(mj::AbstractVector; α = 0.01) -> k
mtest(table::NamedTuple; α = 0.01) -> k

Hypothesis test for decorrelation of a time series after blocking transformations with significance level $1-α$ after Jonson PRE (2018). mj or table.mj is expected to be a vector with relevant $M_j$ values from a blocking analysis as obtained from blocks_with_m(). Returns the row number k where the M-test is passed. If the M-test has failed mtest() returns the value -1.

Rimu.StatsTools.RatioBlockingResult — Type

RatioBlockingResult(ratio, f, σ_f, δ_y, k, success)

Result of ratio_of_means().

Fields:

ratio::P: ratio with uncertainties propagated by MonteCarloMeasurements
f::T: ratio of means
σ_f::T: std from linear propagation
δ_y::T: coefficient of variation for denominator (≤ 0.1 for normal approx)
k::Int: k-1 blocking steps were used to uncorrelate time series
blocks::Int: number of data values after blocking
success::Bool: false if any of the blocking steps failed

Has methods for NamedTuple, val_and_errs, val, errs.

Note: to compute statistics on the RatioBlockingResult, use functions pmedian, pquantile, pmiddle, piterate, pextrema, pminimum, pmaximum, pmean, and pcov.

Rimu.StatsTools.particles — Method

particles(samples, μ, σ)
particles(samples, μ::AbstractVector, Σ::AbstractMatrix)

Return Particles object from MonteCarloMeasurements with single- or multivariate normal distribution. Zero variance parameters are supported.

Rimu.StatsTools.particles — Method

particles(samples, d)
particles(::Nothing, d)
particles(::Val{T}, d) where T

Return Particles object from MonteCarloMeasurements using a type-stable constructor if possible. Pass nothing for the default number of particles or Val(1_000) for using 1000 particles in a type-stable manner. If d is a Particles object it is passed through without re-sampling.

Rimu.StatsTools.ratio_estimators — Method

ratio_estimators(x, y, [k]; corrected=true, mc_samples=10_000) -> (; r, f, σ_f, δ_y, n)

Estimators for the ratio of means mean(x)/mean(y). If k is given, k-1 blocking steps are performed to remove internal correlations in the time series x and y. Otherwise these are assumed to be free of internal correlations. Correlations between x and y may be present and are taken into account.

Return values:

r::Particles is the Monte Carlo sampled ratio estimator, see Particles
f = mean(x)/mean(y)
σ_f standard deviation of f from linear error propagation (normal approximation)
δ_y = std(y)/mean(y) coefficient of variation; < 0.1 for normal approximation to work
n: number of uncorrelated data used for uncertainty estimation

Rimu.StatsTools.x_by_y_linear — Method

x_by_y_linear(μ_x,μ_y,σ_x,σ_y,ρ) -> f, σ_f

Linear error propagation for ratio f = x/y assuming x and y are correlated normal random variables and assuming the ratio can be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013).

\[σ_f = \sqrt{\frac{σ_x}{μ_y}^2 + \frac{μ_x σ_y}{μ_y^2}^2 - \frac{2 ρ μ_x}{μ_y^3}}\]

Core.NamedTuple — Method

NamedTuple(x::BlockingResult; n=1, p=nothing, name=:val)
NamedTuple(x::RatioBlockingResult; n=1, p=nothing, name=:val)

Return a named tuple with value and error bars (see val_and_errs) as well as additional numerical fields relevant for x.

Example:

julia> results = [blocking_analysis(i:0.1:2i+20) for i in 1:3]; # mock results

julia> df = NamedTuple.(results, name=:res)|>DataFrame
3×7 DataFrame
 Row │ res      res_l    res_u    res_err_err  res_p_cov  res_k  res_blocks
     │ Float64  Float64  Float64  Float64      Float64    Int64  Int64
─────┼──────────────────────────────────────────────────────────────────────
   1 │    11.5  1.7282   1.7282      0.352767    2.98667      5          13
   2 │    13.0  1.7282   1.7282      0.352767    2.98667      5          13
   3 │    14.5  1.78885  1.78885     0.350823    3.2          5          14

julia> rbs = ratio_of_means(1 .+sin.(1:0.1:11),2 .+sin.(2:0.1:12)); # more mock results

julia> [NamedTuple(rbs),]|>DataFrame
1×9 DataFrame
 Row │ val       val_l      val_u      val_f     val_σ_f    val_δ_y    val_k  val_blocks  val_success
     │ Float64   Float64    Float64    Float64   Float64    Float64    Int64  Int64       Bool
─────┼────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 0.581549  0.0925669  0.0812292  0.560532  0.0875548  0.0875548      4          12         true

See val_and_errs, val, errs, BlockingResult, RatioBlockingResult.

Rimu.StatsTools.autocovariance — Method

autocovariance(v::Vector,h::Int; corrected::Bool=true)

$\hat{\gamma}(h) =\frac{1}{n}\sum_{t=1}^{n-h}(v_{t+h}-\bar{v})(v_t-\bar{v})^*$ Calculate the autocovariance of dataset v with a delay h. If corrected is true (the default) then the sum is scaled with n-h, whereas the sum is scaled with n if corrected is false where n = length(v).

Rimu.StatsTools.pseudo_cov — Method

pseudo_cov(x, y; xmean = mean(x), ymean = mean(y), corrected = true)

Compute the pseudo covariance between collections x and y returning a scalar:

\[\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})(y_{i} - \bar{y})\]

Optionally, precomputed means can be passed as keyword arguments. pseudo_cov(x,y) is functionally equivalent to Statistics.cov(x, conj(y); corrected = false) but it is found to be significantly faster and avoids allocations.

Index