Module Rimu/StatsTools
The module Rimu/StatsTools contains helper function for analysis and post processing of Monte Carlo data.
Usage
The module is not exported by default. In oder to use its functions without explicitly specifying the submodule, import the module with
using Rimu, Rimu.StatsToolsBlocking analysis
For blocking analysis of a single time series use blocking_analysis, or mean_and_se.
For blocking analysis of a couple of time series where the ration of means is the quantity of interest use ratio_of_means.
Exported
Rimu.StatsTools — ModuleTools for the statistical analysis of Monte Carlo data.
Exports:
Rimu.StatsTools.blocking_analysis — Methodblocking_analysis(v::AbstractVector; α = 0.01, corrected = true)
-> BlockingResult(mean, err, err_err, p_cov, k, blocks)Compute the sample mean mean and estimate the standard deviation of the mean (standard error) err of a correlated time series using the blocking algorithm from Flyvberg and Peterson JCP (1989) and the M test of Jonsson PRE (2018) at significance level $1-α$. k is the number of blocking transformations required to pass the hypothesis test for an uncorrelated time series and err_err the estimated standard error or err.
If decorrelating the time series fails according to the M test, NaN is returned as the standard error and -1 for k. corrected controls whether bias correction for variances is used.
See BlockingResult.
Rimu.StatsTools.mean_and_se — Methodmean_and_se(v::AbstractVector; α = 0.01, corrected::Bool=true) -> mean, err
mean_and_se(r::BlockingResult) -> mean, errReturn the mean and standard error (as a tuple) of a time series obtained from blocking_analysis. See also BlockingResult.
Rimu.StatsTools.ratio_of_means — Methodratio_of_means(num, denom; α = 0.01, corrected = true, mc_samples = 10_000) -> rEstimate the ratio of mean(num)/mean(denom) assuming that num and denom are possibly correlated time series. A blocking analysis with m-test is used to uncorrelate the time series, see blocking_analysis(). The remaining standard error and correlation of the means is propagated using MonteCarloMeasurements. The results are reported as a RatioBlockingResult.
Robust estimates for the ratio are obtained from median(r) and confidence intervals from quantile(), e.g. quantile(r, [0.025, 0.975]) for the 95% confidence interval.
Estimates from linear uncertainty propagation are returned as r.f and r.σ_f using x_by_y_linear().
Rimu.StatsTools.med_and_errs — Methodmed_and_errs(p) -> (; med, err1_l, err1_u, err2_l, err2_u)Convenience function for extracting plottable data from a distribution or an uncertain object created by the packages MonteCarloMeasurements or Measurements. Returns the median med and the lower err1_l and upper err1_u standard error (for 1σ or 68% confidence inteval). err2_l and err2_u provide the lower and upper error limits for the 2σ or 95% confidence interval.
Example:
julia> results = [Particles(d) for d in datasets] # Particles[]
julia> res_w_errs = med_and_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×5 DataFrame
Row │ med err1_l err1_u err2_l err2_u
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 1.01325 0.0173805 0.0183057 0.034042 0.0366713Rimu.StatsTools.ratio_with_errs — Methodratio_with_errs(r::RatioBlockingResult)
-> (;ratio=med, err1_l, err1_u, err2_l, err2_u, f, σ_f, δ_y, k, success)Convenience function for extracting plottable data from RatioBlockingResult. Returns NamedTuple with median and standard error of r extracted by p_to_errs(). See also ratio_of_means().
Example:
julia> results = [ratio_of_means(n[i], d[i]; args...) for i in datasets]
julia> res_w_errs = ratio_with_errs.(results) # Vector of NamedTuple's with standard errors
julia> res_df = DataFrame(res_w_errs) # results as DataFrame with lower an upper error
1×10 DataFrame
Row │ ratio err1_l err1_u err2_l err2_u f σ_f δ_y k success
│ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Int64 Bool
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1.01325 0.0173805 0.0183057 0.034042 0.0366713 1.01361 0.0181869 0.0128806 2 trueRimu.StatsTools.replica_fidelity — Methodreplica_fidelity(rr::Tuple; p_field = :hproj, skip = 0)Compute the fidelity of the average coefficient vector and the projector defined in p_field from the result of replica fci_qmc!() passed as argument rr (a tuple of DataFrames). Calls ratio_of_means() to perform a blocking analysis on a ratio of the means of separate time series and returns a RatioBlockingResult. The first skip steps in the time series are skipped.
The fidelity of states |ψ⟩ and |ϕ⟩ is defined as
\[F(ψ,ϕ) = \frac{|⟨ψ|ϕ⟩|^2}{⟨ψ|ψ⟩⟨ϕ|ϕ⟩} .\]
Specifically, replica_fidelity computes
\[F(\mathbf{v},⟨\mathbf{c}⟩) = \frac{⟨(\mathbf{c}_A⋅\mathbf{v})(\mathbf{v}⋅\mathbf{c}_B)⟩} {⟨\mathbf{c}_A⋅\mathbf{c}_B⟩} ,\]
where v is the projector specified by p_field, which is assumed to be normalised to unity with the two-norm (i.e. v⋅v == 1), and $\mathbf{c}_A$ and $\mathbf{c}_B$ are two replica coefficient vectors.
Rimu.StatsTools.to_measurement — Methodto_measurement(p::MonteCarloMeasurements.Particles) -> ::Measurements.measurementConvert an uncertain number from MonteCarloMeasurements to Measurements format using the median as the central point. The new ± boundaries will include the 68% quantile around the median.
Rimu.StatsTools.growth_witness — Methodgrowth_witness(norm::AbstractArray, shift::AbstractArray, dt, [b]) -> g
growth_witness(df::DataFrame, [b]) -> gCompute the growth witness
\[G^{(n)} = S^{(n)} - \frac{\vert\mathbf{c}^{(n+1)}\vert - \vert\mathbf{c}^{(n)}\vert}{\vert\mathbf{c}^{(n)}\vert d\tau},\]
where S is the shift and $\vert\mathbf{c}^{(n)}\vert ==$ norm[n, 1]. Setting b ≥ 1 a sliding average over b time steps is computed using smoothen().
Rimu.StatsTools.smoothen — Methodsmoothen(noisy::AbstractVector, b)Smoothen the array noisy by averaging over a sliding window of length b and wrapping noisy periodically. The mean(noisy) is preserved.
Rimu.StatsTools.growth_estimator — Methodgrowth_estimator(
shift, wn, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
change_type = identity,
kwargs...,
) -> (; E_gr::Particles, k, blocks, success)
growth_estimator(df::DataFrame, h; kwargs ...)Compute the growth estimator with reference energy E_r by the reweighting technique described in Umirgar et al. (1993), see Eq. (20). shift and wn are equal length vectors containing the shift and walker number time series, respectively. Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). dτ is the time step and weights a function that calulates the weights. See w_exp() and w_lin().
\[E_{gr} = E_r - \frac{1}{dτ}\ln \frac{\sum_n w_{h+1}^{(n+1)} N_\mathrm{w}^{(n+1)}} {\sum_m w_{h}^{(m)} N_\mathrm{w}^{(m)}}\]
When h is greater than the autocorrelation time scale of the shift, then E_gr is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the (biased) shift estimator. Error propagation is done with MonteCarloMeasurements. Progagation through the logarithm can be modified by setting change_type to to_measurement in order to avoid NaN results from negative outliers.
If success==true the blocking analysis was successful in k-1 steps, using blocks uncorrelated data points.
See also mixed_estimator().
Rimu.StatsTools.mixed_estimator — Methodmixed_estimator(
hproj, vproj, shift, h, dτ;
skip = 0,
E_r = mean(shift[skip+1:end]),
weights = w_exp,
kwargs...,
) -> r::RatioBlockingResult
mixed_estimator(df::DataFrame, h; kwargs...)Compute the mixed estimator by the reweighting technique described in Umirgar et al. (1993), Eq. (19)
\[E_\mathrm{mix} = \frac{\sum_n w_{h}^{(n)} (Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}} {\sum_m w_{h}^{(m)} \mathrm{v}⋅\mathrm{c}^{(m)}} ,\]
where the time series hproj == $(Ĥ'\mathrm{v})⋅\mathrm{c}^{(n)}$ and vproj == $\mathrm{v}⋅\mathrm{c}^{(m)}$ have the same length as shift (See ReportingStrategy on how to set these up). Reweighting is done over h time steps and length(shift) - skip time steps are used for the blocking analysis done with ratio_of_means(). dτ is the time step and weights a function that calulates the weights. See w_exp() and w_lin(). Additional keyword arguments are passed on to ratio_of_means().
When h is greater than the autocorrelation time scale of the shift, then r.ratio is an unbiased but approximate estimator for the ground state energy $E_0$ with an error $\mathcal{O}(dτ^2)$ and potentially increased confidence intervals compared to the unweighted ratio. Error propagation is done with MonteCarloMeasurements. Results are returned as RatioBlockingResult.
See also growth_estimator().
Rimu.StatsTools.w_exp — Methodw_exp(shift, h, dτ; E_r = mean(shift), skip = 0)Compute the weights for reweighting over h time steps with reference energy E_r from the exponential formula
\[w_h^{(n)} = \prod_{j=1}^h \exp[-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip.
See also w_lin(), growth_estimator(), mixed_estimator().
Rimu.StatsTools.w_lin — Methodw_lin(shift, h, dτ; E_r = mean(shift), skip = 0)Compute the weights for reweighting over h time steps with reference energy E_r from the linearised formula
\[w_h^{(n)} = \prod_{j=1}^h [1-dτ(S^{(q+n-j)}-E_r)] ,\]
where q = skip.
See also w_exp(), growth_estimator(), mixed_estimator().
Additional docstrings
MonteCarloMeasurements.Particles — MethodMonteCarloMeasurements.Particles(r::BlockingResult; mc_samples = 2000)
MonteCarloMeasurements.±(r::BlockingResult)Convert a BlockingResult into a Particles object for nonlinear error propagation with MonteCarloMeasurements.
Rimu.StatsTools.BlockingResult — TypeBlockingResult(mean, err, err_err, p_cov, k, blocks)Result of blocking_analysis().
Fields:
mean: sample meanerr: standard error (estimated standard deviation of the mean)err_err: estimated uncertainty oferrp_cov: estimated pseudo covariance ofmean, relevant for complex time seriesk::Int: k-1 blocking steps were used to uncorrelate time seriesblocks::Int: number of uncorrelated values after blocking
Has methods for mean_and_se, Measurements.:±, MonteCarloMeasurements.Particles, and Statistics.cov for Complex data.
Measurements.measurement — Methodmeasurement(r::BlockingResult)
Measurements.±(r::BlockingResult)Convert a BlockingResult into a Measurement for linear error propagation with Measurements.
Limitation: Does not account for covariance in complex BlockingResult. Consider using MonteCarloMeasurements.Particles(r)!
Rimu.StatsTools.blocker — Methodblocker(v::Vector) -> new_v::VectorReblock the data by successively taking the mean of two adjacent data points to form a new vector with a half of the length(v). The last data point will be discarded if length(v) is odd.
Rimu.StatsTools.blocks_with_m — Methodblocks_with_m(v; corrected = true) -> (;blocks, mean, std_err, std_err_err, p_cov, mj)Perform the blocking algorithm from Flyvberg and Peterson JCP (1989). Returns named tuple with the results from all blocking steps. See mtest().
Rimu.StatsTools.mtest — Methodmtest(mj::AbstractVector; α = 0.01, warn = true) -> k
mtest(table; α = 0.01, warn = true) -> kHypothesis test for decorrelation of a time series after blocking transformations with significance level $1-α$ after Jonson PRE (2018). mj or table.mj is expected to be a vector with relevant $M_j$ values from a blocking analysis as obtained from blocks_with_m(). Returns the row number k where the M-test is passed. If the M-test has failed mtest() returns the value -1 and optionally prints a warning message.
Statistics.cov — Methodcov(r::BlockingResult{<:Complex})Return the covariance matrix of the multivariate normal distribution approximating the uncertainty of the blocking result r of a complex time series. See (https://en.wikipedia.org/wiki/Complexnormaldistribution).
Rimu.StatsTools.RatioBlockingResult — TypeRatioBlockingResult(ratio, f, σ_f, δ_y, k, success)Result of ratio_of_means().
Fields:
ratio::P: ratio with uncertainties propagated by MonteCarloMeasurementsf::T: ratio of meansσ_f::T: std from linear propagationδ_y::T: coefficient of variation for denominator (≤ 0.1 for normal approx)k::Int: k-1 blocking steps were used to uncorrelate time seriesblocks::Int: number of data values after blockingsuccess::Bool: false if any of the blocking steps failed
Rimu.StatsTools.ratio_estimators — Methodratio_estimators(x, y, [k]; corrected=true, mc_samples=10_000) -> (; r, f, σ_f, δ_y, n)Estimators for the ratio of means mean(x)/mean(y). If k is given, k-1 blocking steps are performed to remove internal correlations in the time series x and y. Otherwise these are assumed to be free of internal correlations. Correlations between x and y may be present and are taken into account.
Return values:
r::Particlesis the Monte Carlo sampled ratio estimator, seeParticlesf = mean(x)/mean(y)σ_fstandard deviation offfrom linear error propagation (normal approximation)δ_y = std(y)/mean(y)coefficient of variation; < 0.1 for normal approximation to workn: number of uncorrelated data used for uncertainty estimation
Rimu.StatsTools.x_by_y_linear — Methodx_by_y_linear(μ_x,μ_y,σ_x,σ_y,ρ) -> f, σ_fLinear error propagation for ratio f = x/y assuming x and y are correlated normal random variables and assuming the ratio can be approximated as a normal distribution. See wikipedia and Díaz-Francés, Rubio (2013).
\[σ_f = \sqrt{\frac{σ_x}{μ_y}^2 + \frac{μ_x σ_y}{μ_y^2}^2 - \frac{2 ρ μ_x}{μ_y^3}}\]
Rimu.StatsTools.autocovariance — Methodautocovariance(v::Vector,h::Int; corrected::Bool=true)$\hat{\gamma}(h) =\frac{1}{n}\sum_{t=1}^{n-h}(v_{t+h}-\bar{v})(v_t-\bar{v})^*$ Calculate the autocovariance of dataset v with a delay h. If corrected is true (the default) then the sum is scaled with n-h, whereas the sum is scaled with n if corrected is false where n = length(v).
Rimu.StatsTools.pseudo_cov — Methodpseudo_cov(x, y; xmean = mean(x), ymean = mean(y), corrected = true)Compute the pseudo covariance between collections x and y returning a scalar:
\[\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})(y_{i} - \bar{y})\]
Optionally, precomputed means can be passed as keyword arguments. pseudo_cov(x,y) is functionally equivalent to Statistics.cov(x, conj(y); corrected = false) but it is found to be significantly faster and avoids allocations.
Index
Rimu.StatsToolsMonteCarloMeasurements.ParticlesRimu.StatsTools.BlockingResultRimu.StatsTools.RatioBlockingResultMeasurements.measurementRimu.StatsTools.autocovarianceRimu.StatsTools.blockerRimu.StatsTools.blocking_analysisRimu.StatsTools.blocks_with_mRimu.StatsTools.growth_estimatorRimu.StatsTools.growth_witnessRimu.StatsTools.mean_and_seRimu.StatsTools.med_and_errsRimu.StatsTools.mixed_estimatorRimu.StatsTools.mtestRimu.StatsTools.pseudo_covRimu.StatsTools.ratio_estimatorsRimu.StatsTools.ratio_of_meansRimu.StatsTools.ratio_with_errsRimu.StatsTools.replica_fidelityRimu.StatsTools.smoothenRimu.StatsTools.to_measurementRimu.StatsTools.w_expRimu.StatsTools.w_linRimu.StatsTools.x_by_y_linearStatistics.cov