Document of the "CC" folder

This toolkit consists of C++ classes for Gaussian process regression(GPR) and Bayesian scaling analysis(BSA) and the application codes for the finite-size scaling analysis. The BSA is a new method of statistical inference in the scaling analysis of critical phenomena. It is based on Bayesian statistics, most specifically, the GPR. The BSA assumes only the smoothness of a scaling function, and it does not need a form. Thus, it may be more effective than conventional approaches in the scaling analyses of critical phenomena. We find the detail of BSA in the paper (Kenji Harada, Physical Review E 84 (2011) 056704).

Because the BSA is a new technique, this toolkit is the reference code, but this toolkit is alpha version. Your comment and suggestion are welcome. In particular, there are only C++ codes. I hope the other language codes as python, fortran, c, or perl, etc.. Your contribution is welcome.

You could see the demonstrations of BSA from this page. The second case in the demonstration corresponds to the inference process of present application code.

I hope that the BSA helps your study.

July 2013, Kenji Harada (@KenjiHarada)

Graduate school of informatics, Kyoto University
E-mail: harada@acs.i.kyoto-u.ac.jp
URL: https://www-np.acs.i.kyoto-u.ac.jp/~harada/index_en.html

Note

In some cases, the estimation of confidence interval of parameters may not be good, if MC samples are always near the best point of parameters. In the MCMC, the samples are correlated. In the bad case, because the strength of correlations between samples are large, we cannot make many samples with statistical independence. Then, we need to tune the MC steps to resolve it. However, I have no idea to do it automatically yet. I recommend to combine the systematic analysis for the confidence interval with these application codes.
To spread the Bayesian scaling analysis method, I hope you will cite the paper (Kenji Harada, Physical Review E 84 (2011) 056704) in your report or paper.

I. Application codes

This toolkit gives applications for finite-size scaling analysis, extrapolation of a data set and finding a cross of two data set.

bfss: Maximum likelihood estimation of scaling parameters.
bfss_mc: Monte carlo estimation of mean and confidence interval of scaling parameters.
bfss_c: Maximum likelihood estimation of scaling parameters with correction to scaling.
bfss_c_mc: Monte carlo estimation of confidence intervals of scaling parameters with correction to scaling.
bext: Baysian extrapolation by GPR.
bcross: Find a cross of two data set by Baysian extrapolation.

Note: please use the latest version which fixes the bugs.

Compile and test

Firstly, we needs to install the GSL library for the class GPR::Regression. After that, we needs to change variables in Makefile: GSL_DIR and BLAS_LIB in CC folder.

To compile and test, we can use "make" command as follows.

% cd CC 
% make
% make test

To see the result of finite-size scaling of Binder ratio of Ising model, one can use "gnuplot" as follows.

% gnuplot
gnuplot> plot "test.op" u 1:2:3 i 0 w e
gnuplot> plot "test_mc.op" u 1:2:3 i 0 w e
gnuplot> plot "test_c.op" u 1:2:3 i 0 w e
gnuplot> plot "test_c_mc.op" u 1:2:3 i 0 w e

Usage

Usage: ./bfss [data_file] [three physical parameter sets] (option)[three hyper parameter sets]
Usage: ./bfss_mc [data_file] [three physical parameter sets] (option)[three hyper parameter sets]
Usage: ./bfss_c [data_file] [five physical parameter sets] (option)[three hyper parameter sets]
Usage: ./bfss_c_mc [data_file] [five physical parameter sets] (option)[three hyper parameter sets]

The command "bfss" finds the best finite-size scaling parameters. The command "bfss_mc" calculates the confidential intervals of parameters by Monte Carlo sampling. The command "bfss_c" finds the best finite-size scaling parameters with correction to scaling. The command "bfss_c_mc" calculates the confidential intervals of parameters by Monte Carlo sampling for the case of correction to scaling.

If [data_file] is '-', data are loaded from STDIN. A parameter set is a pair of mask and initial value of parameter.

If mask = 0(1), the parameter is fixed (unfixed).

Usually, it is better that a hyper parameter starts from 1, because all data are renormalized. If you do not give hyper parameter sets, the default values are 1.

Example.
% ./bfss Test/Ising-square-Binder.dat 1 0.42 1 0.9 1 0.1 1 1 1 1 1 1 > test.op 2>test.log
% ./bfss_mc Test/Ising-square-Binder.dat 1 0.42 1 0.9 1 0.1 1 1 1 1 1 1 > test_mc.op 2>test_mc.log
% ./bfss_c Sample/sample-c.dat 1 0.28 1 0.9 0 0 1 1 1 2 1 1 1 1 1 1 > test_c.op 2>test_c.log
% ./bfss_c_mc Sample/sample-c.dat 1 0.28 1 0.9 0 0 1 1 1 2 1 1 1 1 1 1 > test_c_mc.op 2>test_c_mc.log

II. Physical parameters

Scaling law

The finite-size scaling form is written as \begin{equation} A(T, L) = L^{c_2} F[( T - T_c ) L^{c_1} ], \end{equation} where \( A \) is an observable. The triplet of a data point is defined as \begin{equation} X = (T - T_c ) (L/L_m)^{c_1} / R_X,\ Y = (A / (L/L_m)^{c_2} - Y_0)/R_Y,\ E = \delta A/ (L/L_m)^{c_2}/R_Y, \end{equation} where \( \delta A \) is an error of \( A \) and \( L_m \) is the largest \( L \). Scaling factor \( R_X \) is defined so that the width of X for \( L_m \) is 2. Scaling factor \( R_Y \) and \( Y_0 \) is defined so that Y for \( L_m \) is in [-1:1]. The data ansatz is \begin{equation} Y \sim F(X) \pm E. \end{equation} The physical parameters are defined as [0] = \( T_c \), [1] = \( c_1 \), and [2] = \( c_2 \).

Scaling law with correction to scaling

The finite-size scaling form with correction to scaling is written as \begin{equation} A(T, L) = L^{c_2} F[ ( T - T_c ) L^{c_1} ] ( 1 + a L^{-w} ), \end{equation} where \( A \) is an observable. The triplet of a data point is defined as \begin{eqnarray} X &=& (T - T_c ) (L/L_m)^{c_1} / R_X,\\ Y &=& (A / (L/L_m)^{c_2} \times [( 1 + a L_m^{-w}) / ( 1 + a L^{-w})] - Y_0)/R_Y,\\ E &=& \delta A/ (L/L_m)^{c_2} \times [( 1 + a L_m^{-w}) / ( 1 + a L^{-w})] /R_Y, \end{eqnarray} where \( \delta A \) is an error of \( A \) and \( L_m \) is the largest \( L \). Scaling factor \( R_X \) is defined so that the width of X for \( L_m \) is 2. Scaling factor \( R_Y \) and \( Y_0 \) is defined so that Y for \( L_m \) is in [-1:1]. The data ansatz is \begin{equation} Y \sim F(X) \pm E. \end{equation} The physical parameters are defined as [0] = \( T_c \), [1] = \( c_1 \), [2] = \( c_2 \), [3] = \( a \), and [4] = \( w \).

Note

This code can be applied to a scaling analysis which has the same form of the finite-size scaling law.

III. Hyper parameters

Kernel function

Kernel function is written as \begin{equation} k_G(i, j) = \delta_{ij} (E_i^2 + \theta_2^2) + \theta_0^2 \exp( - |X_i- X_j|^2 / 2\theta_1^2 ). \end{equation} The hyper parameters are defined as [0] = \( \theta_0 \), [1] = \( \theta_1 \), and [2] = \( \theta_2 \).

IV. Data file for input

Format

The format of data file is as follows.

# L   T            A              Error_of_A
128   4.200000e-01 6.271240e-02   1.336090e-03

A line has to be ended with the newline character. Comment lines starts with the character '#'. A null line is ignored. Other lines contain four values. The value of \( L \) is in the first column of data file. The value of \( T \) is in the second column. The value of \( A \) is in the third column. The value of \( \delta A \) is in the fourth column. If a line is not correctly formatted, it will be skipped.

V. Output of commands

The value of best parameters and the confidential intervals are written in header as comments. The remain consists of three groups. These three groups are separated two null lines.

The first group is the scaling result with unnormalized variables as \begin{equation} X = (T - T_c ) L^{c_1}, Y = A / L^{c_2}, E = \delta A/ L^{c_2}. \end{equation} In all groups, the best parameters are used. In the case of commands "bfss_mc" and "bfss_c_mc", the average of parameters are used. The line of output contains a list of \( X, Y, E, L, T, A, \mbox{and}\ \delta A.\)

The second one consists of 100 points of the inferred scaling function with unnormalized variables. The line contains a list of \( X, \mu(X), \sqrt{\sigma^2(X)} \).

The third one is the scaling result with normalized variables as \begin{equation} X = (T - T_c ) (L/L_m)^{c_1} / R_X,\ Y = (A / (L/L_m)^{c_2} - Y_0)/R_Y,\ E = \delta A/ (L/L_m)^{c_2}/R_Y. \end{equation} The line contains a list of \( X, Y, E, L, T, A, \mbox{and}\ \delta A.\)

In the case of correction to scaling

The first group is the scaling result with unnormalized variables as \begin{equation} X = (T - T_c ) L^{c_1}, Y = A / L^{c_2} / ( 1 + a L^{-w}), E = \delta A/ L^{c_2} / ( 1 + a L^{-w}). \end{equation} The third one is the scaling result with normalized variables as \begin{eqnarray} X &=& (T - T_c ) (L/L_m)^{c_1} / R_X,\\ Y &=& (A / (L/L_m)^{c_2} \times [( 1 + a L_m^{-w}) / ( 1 + a L^{-w})] - Y_0)/R_Y,\\ E &=& \delta A/ (L/L_m)^{c_2} \times [( 1 + a L_m^{-w}) / ( 1 + a L^{-w})] /R_Y. \end{eqnarray}