Module Rimu/StatsTools

The module Rimu/StatsTools contains helper function for analysis and post processing of Monte Carlo data.

Usage

The module is not exported by default. In oder to use its functions without explicitly specifying the submodule, import the module with

using Rimu, Rimu.StatsTools

Blocking analysis

For blocking analysis of a single time series use blocking_analysis, or mean_and_se.

For blocking analysis of a couple of time series where the ration of means is the quantity of interest use ratio_of_means.

Exported

Rimu.StatsTools.blocking_analysisMethod
blocking_analysis(v::AbstractVector; α = 0.01, corrected = true)
-> BlockingResult(mean, err, err_err, p_cov, k, blocks)

Compute the sample mean mean and estimate the standard deviation of the mean (standard error) err of a correlated time series using the blocking algorithm from Flyvberg and Peterson JCP (1989) and the M test of Jonsson PRE (2018) at significance level $1-α$. k is the number of blocking transformations required to pass the hypothesis test for an uncorrelated time series and err_err the estimated standard error or err.

If decorrelating the time series fails according to the M test, NaN is returned as the standard error and -1 for k. corrected controls whether bias correction for variances is used.

See BlockingResult.

source
Rimu.StatsTools.ratio_of_meansMethod
ratio_of_means(num, denom; α = 0.01, corrected = true, mc_samples = 10_000) -> r

Estimate the ratio of mean(num)/mean(denom) assuming that num and denom are possibly correlated time series. A blocking analysis with m-test is used to uncorrelate the time series, see blocking_analysis(). The remaining standard error and correlation of the means is propagated using MonteCarloMeasurements. The results are reported as a RatioBlockingResult.

Robust estimates for the ratio are obtained from median(r) and confidence intervals from quantile(), e.g. quantile(r, [0.025, 0.975]) for the 95% confidence interval.

Estimates from linear uncertainty propagation are returned as r.f and r.σ_f using x_by_y_linear().

source
Rimu.StatsTools.med_and_errsMethod
med_and_errs(p) -> (; med, err1_l, err1_u, err2_l, err2_u)

Convenience function for extracting plottable data from a distribution or an uncertain object created by the packages MonteCarloMeasurements or Measurements. Returns the median med and the lower err1_l and upper err1_u standard error (for 1σ or 68% confidence inteval). err2_l and err2_u provide the lower and upper error limits for the 2σ or 95% confidence interval.

Example:

julia> results = [Particles(d) for d in datasets] # Particles[]
julia> res_w_errs = med_and_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×5 DataFrame
 Row │ med      err1_l     err1_u     err2_l    err2_u
     │ Float64  Float64    Float64    Float64   Float64
─────┼────────────────────────────────────────────────────
   1 │ 1.01325  0.0173805  0.0183057  0.034042  0.0366713
source
Rimu.StatsTools.ratio_with_errsMethod
ratio_with_errs(r::RatioBlockingResult)
-> (;ratio=med, err1_l, err1_u, err2_l, err2_u, f, σ_f, δ_y, k, success)

Convenience function for extracting plottable data from RatioBlockingResult. Returns NamedTuple with median and standard error of r extracted by p_to_errs(). See also ratio_of_means().

Example:

julia> results = [ratio_of_means(n[i], d[i]; args...) for i in datasets]
julia> res_w_errs = ratio_with_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×10 DataFrame
 Row │ ratio    err1_l     err1_u     err2_l    err2_u     f        σ_f        δ_y        k      success
     │ Float64  Float64    Float64    Float64   Float64    Float64  Float64    Float64    Int64  Bool
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1.01325  0.0173805  0.0183057  0.034042  0.0366713  1.01361  0.0181869  0.0128806      2     true
source
Rimu.StatsTools.replica_fidelityMethod
replica_fidelity(rr::Tuple; p_field = :hproj, skip = 0)

Compute the fidelity of the average coefficient vector and the projector defined in p_field from the result of replica fci_qmc!() passed as argument rr (a tuple of DataFrames). Calls ratio_of_means() to perform a blocking analysis on a ratio of the means of separate time series and returns a RatioBlockingResult. The first skip steps in the time series are skipped.

The fidelity of states |ψ⟩ and |ϕ⟩ is defined as

\[F(ψ,ϕ) = \frac{|⟨ψ|ϕ⟩|^2}{⟨ψ|ψ⟩⟨ϕ|ϕ⟩} .\]

Specifically, replica_fidelity computes

\[F(\mathbf{v},⟨\mathbf{c}⟩) = \frac{⟨(\mathbf{c}_A⋅\mathbf{v})(\mathbf{v}⋅\mathbf{c}_B)⟩} {⟨\mathbf{c}_A⋅\mathbf{c}_B⟩} ,\]

where v is the projector specified by p_field, which is assumed to be normalised to unity with the two-norm (i.e. v⋅v == 1), and $\mathbf{c}_A$ and $\mathbf{c}_B$ are two replica coefficient vectors.

source
Rimu.StatsTools.to_measurementMethod
to_measurement(p::MonteCarloMeasurements.Particles) -> ::Measurements.measurement

Convert an uncertain number from MonteCarloMeasurements to Measurements format using the median as the central point. The new ± boundaries will include the 68% quantile around the median.

source
Rimu.StatsTools.growth_witnessMethod
growth_witness(norm::AbstractArray, shift::AbstractArray, dt, [b]) -> g
growth_witness(df::DataFrame, [b]) -> g

Compute the growth witness

\[G^{(n)} = S^{(n)} - \frac{\vert\mathbf{c}^{(n+1)}\vert - \vert\mathbf{c}^{(n)}\vert}{\vert\mathbf{c}^{(n)}\vert d\tau},\]

where S is the shift and $\vert\mathbf{c}^{(n)}\vert ==$ norm[n, 1]. Setting b ≥ 1 a sliding average over b time steps is computed using smoothen().

source
Rimu.StatsTools.smoothenMethod
smoothen(noisy::AbstractVector, b)

Smoothen the array noisy by averaging over a sliding window of length b and wrapping noisy periodically. The mean(noisy) is preserved.

source
Rimu.StatsTools.growth_estimatorMethod
growth_estimator(
    shift, wn, h, dτ;
    skip = 0,
    E_r = mean(shift[skip+1:end]),
    weights = w_exp,
    change_type = identity,
    kwargs...,
) -> (; E_gr::Particles, k, blocks, success)
growth_estimator(df::DataFrame, h; kwargs ...)

Compute the growth estimator with reference energy E_r by the reweighting technique described in Umirgar et al. (1993), see Eq. (20). shift and wn are equal length vectors containing the shift and walker number time series, respectively. Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). is the time step and weights a function that calulates the weights. See w_exp() and w_lin().

\[E_{gr} = E_r - \frac{1}{dτ}\ln \frac{\sum_n w_{h+1}^{(n+1)} N_\mathrm{w}^{(n+1)}} {\sum_m w_{h}^{(m)} N_\mathrm{w}^{(m)}}\]

When h is greater than the autocorrelation time scale of the shift, then E_gr is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the (biased) shift estimator. Error propagation is done with MonteCarloMeasurements. Progagation through the logarithm can be modified by setting change_type to to_measurement in order to avoid NaN results from negative outliers.

If success==true the blocking analysis was successful in k-1 steps, using blocks uncorrelated data points.

See also mixed_estimator().

source
Rimu.StatsTools.mixed_estimatorMethod
mixed_estimator(
    hproj, vproj, shift, h, dτ;
    skip = 0,
    E_r = mean(shift[skip+1:end]),
    weights = w_exp,
    kwargs...,
) -> r::RatioBlockingResult
mixed_estimator(df::DataFrame, h; kwargs...)

Compute the mixed estimator by the reweighting technique described in Umirgar et al. (1993), Eq. (19)

\[E_\mathrm{mix} = \frac{\sum_n w_{h}^{(n)} (Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}} {\sum_m w_{h}^{(m)} \mathrm{v}⋅\mathrm{c}^{(m)}} ,\]

where the time series hproj == $(Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}$ and vproj == $\mathrm{v}⋅\mathrm{c}^{(m)}$ have the same length as shift (See ReportingStrategy on how to set these up). Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). is the time step and weights a function that calulates the weights. See w_exp() and w_lin(). Additional keyword arguments are passed on to ratio_of_means().

When h is greater than the autocorrelation time scale of the shift, then r.ratio is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the unweighted ratio. Error propagation is done with MonteCarloMeasurements. Results are returned as RatioBlockingResult.

See also growth_estimator().

source

Additional docstrings

Rimu.StatsTools.BlockingResultType
BlockingResult(mean, err, err_err, p_cov, k, blocks)

Result of blocking_analysis().

Fields:

  • mean: sample mean
  • err: standard error (estimated standard deviation of the mean)
  • err_err: estimated uncertainty of err
  • p_cov: estimated pseudo covariance of mean, relevant for complex time series
  • k::Int: k-1 blocking steps were used to uncorrelate time series
  • blocks::Int: number of uncorrelated values after blocking

Has methods for mean_and_se, Measurements.:±, MonteCarloMeasurements.Particles, and Statistics.cov for Complex data.

source
Rimu.StatsTools.blockerMethod
blocker(v::Vector) -> new_v::Vector

Reblock the data by successively taking the mean of two adjacent data points to form a new vector with a half of the length(v). The last data point will be discarded if length(v) is odd.

source
Rimu.StatsTools.blocks_with_mMethod
blocks_with_m(v; corrected = true) -> (;blocks, mean, std_err, std_err_err, p_cov, mj)

Perform the blocking algorithm from Flyvberg and Peterson JCP (1989). Returns named tuple with the results from all blocking steps. See mtest().

source
Rimu.StatsTools.mtestMethod
mtest(mj::AbstractVector; α = 0.01, warn = true) -> k
mtest(table; α = 0.01, warn = true) -> k

Hypothesis test for decorrelation of a time series after blocking transformations with significance level $1-α$ after Jonson PRE (2018). mj or table.mj is expected to be a vector with relevant $M_j$ values from a blocking analysis as obtained from blocks_with_m(). Returns the row number k where the M-test is passed. If the M-test has failed mtest() returns the value -1 and optionally prints a warning message.

source
Statistics.covMethod
cov(r::BlockingResult{<:Complex})

Return the covariance matrix of the multivariate normal distribution approximating the uncertainty of the blocking result r of a complex time series. See (https://en.wikipedia.org/wiki/Complexnormaldistribution).

source
Rimu.StatsTools.RatioBlockingResultType
RatioBlockingResult(ratio, f, σ_f, δ_y, k, success)

Result of ratio_of_means().

Fields:

  • ratio::P: ratio with uncertainties propagated by MonteCarloMeasurements
  • f::T: ratio of means
  • σ_f::T: std from linear propagation
  • δ_y::T: coefficient of variation for denominator (≤ 0.1 for normal approx)
  • k::Int: k-1 blocking steps were used to uncorrelate time series
  • blocks::Int: number of data values after blocking
  • success::Bool: false if any of the blocking steps failed
source
Rimu.StatsTools.ratio_estimatorsMethod
ratio_estimators(x, y, [k]; corrected=true, mc_samples=10_000) -> (; r, f, σ_f, δ_y, n)

Estimators for the ratio of means mean(x)/mean(y). If k is given, k-1 blocking steps are performed to remove internal correlations in the time series x and y. Otherwise these are assumed to be free of internal correlations. Correlations between x and y may be present and are taken into account.

Return values:

  • r::Particles is the Monte Carlo sampled ratio estimator, see Particles
  • f = mean(x)/mean(y)
  • σ_f standard deviation of f from linear error propagation (normal approximation)
  • δ_y = std(y)/mean(y) coefficient of variation; < 0.1 for normal approximation to work
  • n: number of uncorrelated data used for uncertainty estimation
source
Rimu.StatsTools.x_by_y_linearMethod
x_by_y_linear(μ_x,μ_y,σ_x,σ_y,ρ) -> f, σ_f

Linear error propagation for ratio f = x/y assuming x and y are correlated normal random variables and assuming the ratio can be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013).

\[σ_f = \sqrt{\frac{σ_x}{μ_y}^2 + \frac{μ_x σ_y}{μ_y^2}^2 - \frac{2 ρ μ_x}{μ_y^3}}\]

source
Rimu.StatsTools.autocovarianceMethod
autocovariance(v::Vector,h::Int; corrected::Bool=true)

$\hat{\gamma}(h) =\frac{1}{n}\sum_{t=1}^{n-h}(v_{t+h}-\bar{v})(v_t-\bar{v})^*$ Calculate the autocovariance of dataset v with a delay h. If corrected is true (the default) then the sum is scaled with n-h, whereas the sum is scaled with n if corrected is false where n = length(v).

source
Rimu.StatsTools.pseudo_covMethod
pseudo_cov(x, y; xmean = mean(x), ymean = mean(y), corrected = true)

Compute the pseudo covariance between collections x and y returning a scalar:

\[\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})(y_{i} - \bar{y})\]

Optionally, precomputed means can be passed as keyword arguments. pseudo_cov(x,y) is functionally equivalent to Statistics.cov(x, conj(y); corrected = false) but it is found to be significantly faster and avoids allocations.

source

Index