Title: | Fits Heterogeneous Panel Data Models |
---|---|
Description: | Fits heterogeneous panel data models with interactive effects for linear regression, logistic, count, probit, quantile, and clustering. Based on Ando, T. and Bai, J. (2015) "A simple new test for slope homogeneity in panel data models with interactive effects" <doi: 10.1016/j.econlet.2015.09.019>, Ando, T. and Bai, J. (2015) "Asset Pricing with a General Multifactor Structure" <doi: 10.1093/jjfinex/nbu026> , Ando, T. and Bai, J. (2016) "Panel data models with grouped factor structure under unknown group membership" <doi: 10.1002/jae.2467>, Ando, T. and Bai, J. (2017) "Clustering huge number of financial time series: A panel data approach with high-dimensional predictors and factor structures" <doi: 10.1080/01621459.2016.1195743>, Ando, T. and Bai, J. (2020) "Quantile co-movement in financial markets" <doi: 10.1080/01621459.2018.1543598>, Ando, T., Bai, J. and Li, K. (2021) "Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity" <doi: 10.1016/j.jeconom.2020.11.013.>. |
Authors: | Tomohiro Ando [aut, cre], Hani Fayad [aut] |
Maintainer: | Tomohiro Ando <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-02-12 04:56:04 UTC |
Source: | https://github.com/tomohiro-ando/pdmif |
A synthesized input variable dataset to fit a linear model on a panel dataset.
data1X
data1X
A data frame with 5,000 rows and 2 columns:
the two independent variables
each 100 rows represent the timeseries of each of the 50 individuals
...
A synthesized output variable dataset to fit a linear model on a panel dataset.
data1Y
data1Y
A data frame with 100 rows and 50 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized input variable dataset to fit a binomial model on a panel dataset.
data2X
data2X
A data frame with 5,000 rows and 2 columns:
the two independent variables
each 50 rows represent the timeseries of each of the 100 individuals
...
A synthesized output variable dataset to fit a binomial model on a panel dataset.
data2Y
data2Y
A data frame with 50 rows and 100 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized input variable dataset to fit a poisson model on a panel dataset.
data3X
data3X
A data frame with 5,000 rows and 3 columns:
the three independent variables
each 50 rows represent the timeseries of each of the 100 individuals
...
A synthesized output variable dataset to fit a poisson model on a panel dataset.
data3Y
data3Y
A data frame with 50 rows and 100 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized vector of memberships needed to fit a linear model on a panel dataset under known group memberships.
data4LAB
data4LAB
A vector with 300 entries indicating the group membership of each individual.
A synthesized input variable dataset to fit a linear model on a panel dataset under known group memberships.
data4X
data4X
A data frame with 30,000 rows and 2 columns:
the two independent variables
each 100 rows represent the timeseries of each of the 300 individuals
...
A synthesized output variable dataset to fit a linear model on a panel dataset under known group memberships.
data4Y
data4Y
A data frame with 100 rows and 300 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized input variable dataset to cluster individuals by heterogeneous panel data models with interactive effects.
data5X
data5X
A data frame with 30,000 rows and 2 columns:
the two independent variables
each 100 rows represent the timeseries of each of the 300 individuals
...
A synthesized output variable dataset to cluster individuals by heterogeneous panel data models with interactive effects.
data5Y
data5Y
A data frame with 100 rows and 300 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized input variable dataset to cluster individual units by nonlinear heterogeneous panel data models with interactive effects when the group membership is unknown
data6X
data6X
A data frame with 4,500 rows and 2 columns:
the two independent variables
each 50 rows represent the timeseries of each of the 90 individuals
...
A synthesized output variable dataset to cluster individual units by nonlinear heterogeneous panel data models with interactive effects when the group membership is unknown.
data6Y
data6Y
A data frame with 50 rows and 90 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized input variable dataset to fit a quantile panel data model on a panel dataset.
data7X
data7X
A data frame with 20,000 rows and 3 columns:
the three independent variables
each 100 rows represent the timeseries of each of the 200 individuals
...
A synthesized output variable dataset to fit a quantile panel data model on a panel dataset.
data7Y
data7Y
A data frame with 100 rows and 200 columns:
the individuals
the time points in the timeseries of each individual
...
A synthesized output variable dataset to fit a quantile VAR model with interactive effects and lag=2.
data8Y
data8Y
A data frame with 102 rows and 15 columns:
the individuals
the time points in the timeseries of each individual
...
This function tests homogeneity of the regression coefficients in heterogeneous panel data models with interactive effects.
HOMTEST(X, Y, Nfactors, Maxit = 100, tol = 0.001)
HOMTEST(X, Y, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
pvalue: The p-value of the homogeneity test.
Ando, T. and Bai, J. (2015) A simple new test for slope homogeneity in panel data models with interactive effects. Economics Letters, 136, 112-117.
fit <- HOMTEST(data1X,data1Y,2,20,0.5)
fit <- HOMTEST(data1X,data1Y,2,20,0.5)
This function tests homogeneity of the regression coefficients in heterogeneous generalized linear models with interactive effects.
HOMTESTGLM(X, Y, FAMILY, Nfactors, Maxit = 100, tol = 0.001)
HOMTESTGLM(X, Y, FAMILY, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
FAMILY |
A description of the error distribution and link function to be used in the model just like in glm functions. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
pvalue: The p-value of the homogeneity test.
Ando, T. and Bai, J. (2015) A simple new test for slope homogeneity in panel data models with interactive effects. Economics Letters, 136, 112-117.
fit <- HOMTESTGLM(data2X,data2Y,binomial(link=logit),2,10,0.5)
fit <- HOMTESTGLM(data2X,data2Y,binomial(link=logit),2,10,0.5)
This function undergoes hypothesis testing for regression coefficients obtained from the various functions in the package.
HYPTEST( B, B0, Se, test = "two", variables = seq(1, nrow(B)), individuals = seq(1, ncol(B)) )
HYPTEST( B, B0, Se, test = "two", variables = seq(1, nrow(B)), individuals = seq(1, ncol(B)) )
B |
A dataframe of Coefficients as obtained in the output of any function in the package. |
B0 |
A dataframe of hypothetical coefficients to be evaluated in the test. (nrows should match number of variables and ncols should match number of individuals) |
Se |
A dataframe of Standard Errors as obtained in the output of any function in the package. |
test |
A string to determine what kind of test to run ("two" for two-tailed, "right" for right-tailed and "left for left-tailed). |
variables |
A list of variables whose coefficients are to be tested. Default is all variables in the B dataframe. |
individuals |
A list of individuals whose coefficients are to be tested. Default is all individuals in the B dataframe. |
A dataframe of p-values resulting from each individual test.
fit <- PDMIFLOGIT(data2X,data2Y,2,20,0.5) HYPTEST(fit$Coefficients,data.frame(c(0,1),c(-1,2)),fit$Se,"two",c(1,3),c(1,2))
fit <- PDMIFLOGIT(data2X,data2Y,2,20,0.5) HYPTEST(fit$Coefficients,data.frame(c(0,1),c(-1,2)),fit$Se,"two",c(1,3),c(1,2))
Under a pre-specified number of groups and the number of common factors, this function implements clustering for N individuals in the panels. Each of individuals in the group are subject to the group-specific unobserved common factors.
PDMIFCLUST(X, Y, NGfactors, NLfactors, Maxit = 100, tol = 0.001)
PDMIFCLUST(X, Y, NGfactors, NLfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
NGfactors |
A pre-specified number of common factors across groups (see example). |
NLfactors |
A pre-specified number of factors in each groups (see example). |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Label: The estimated group membership for each of the individuals.
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
GlobalFactors: The estimated common factors across groups.
GlobalLoadings: The estimated factor loadings for the common factors.
GroupFactors: The estimated group-specific factors.
GroupLoadings: The estimated factor loadings for each group.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2016) Panel data models with grouped factor structure under unknown group membership Journal of Applied Econometrics, 31, 163-191.
Ando, T. and Bai, J. (2017) Clustering huge number of financial time series: A panel data approach with high-dimensional predictors and factor structures. Journal of the American Statistical Association, 112, 1182-1198.
fit <- PDMIFCLUST(data5X,data5Y,2,c(2,2,2),20,0.5)
fit <- PDMIFCLUST(data5X,data5Y,2,c(2,2,2),20,0.5)
Under a pre-specified number of groups and the number of common factors, this function implements clustering for N individual units by nonlinear heterogeneous panel data models with interactive effects. Exponential family of distributions are used Each of individuals in the group are subject to the group-specific unobserved common factors.
PDMIFCLUSTGLM(X, Y, FAMILY, NLfactors, Maxit = 100, tol = 0.001)
PDMIFCLUSTGLM(X, Y, FAMILY, NLfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
FAMILY |
A description of the error distribution and link function to be used in the model just like in glm functions. |
NLfactors |
A pre-specified number of factors in each groups (see example). |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Label: The estimated group membership for each of the individuals.
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
GroupFactors: The estimated group-specific factors.
GroupLoadings: The estimated factor loadings for each group.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2016) Panel data models with grouped factor structure under unknown group membership Journal of Applied Econometrics, 31, 163-191.
Ando, T. and Bai, J. (2017) Clustering huge number of financial time series: A panel data approach with high-dimensional predictors and factor structures. Journal of the American Statistical Association, 112, 1182-1198.
fit <- PDMIFCLUSTGLM(data6X,data6Y,binomial(link=logit),c(1,1,1),3,0.5)
fit <- PDMIFCLUSTGLM(data6X,data6Y,binomial(link=logit),c(1,1,1),3,0.5)
Under a known group membership, this function estimates heterogeneous poisson panel data models with interactive effects.
PDMIFCOUNT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
PDMIFCOUNT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T., Bai, J. and Li, K. (2021) Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity, Journal of Econometrics.
fit <- PDMIFCOUNT(data3X,data3Y,3,30,0.5)
fit <- PDMIFCOUNT(data3X,data3Y,3,30,0.5)
This function estimates heterogeneous panel data models with interactive effects through generalised linear models.
PDMIFGLM(X, Y, FAMILY, Nfactors, Maxit = 100, tol = 0.001)
PDMIFGLM(X, Y, FAMILY, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
FAMILY |
A description of the error distribution and link function to be used in the model just like in glm functions. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T., Bai, J. and Li, K. (2021) Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity, Journal of Econometrics.
fit <- PDMIFGLM(data2X,data2Y,binomial(link=logit),2,20,0.5)
fit <- PDMIFGLM(data2X,data2Y,binomial(link=logit),2,20,0.5)
This function estimates heterogeneous panel data models with interactive effects. This function is similar version of PDMIFLING which accommodates a group structure.
PDMIFLIN(X, Y, Nfactors, Maxit = 100, tol = 0.001)
PDMIFLIN(X, Y, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2015) Asset Pricing with a General Multifactor Structure Journal of Financial Econometrics, 13, 556-604.
fit <- PDMIFLIN(data1X,data1Y,2)
fit <- PDMIFLIN(data1X,data1Y,2)
Under a known group membership, this function estimates heterogeneous panel data models with interactive effects. Together with the regression coefficients, this function estimates the unobserved common factor structures both for across/within groups.
PDMIFLING(X, Y, Membership, NGfactors, NLfactors, Maxit = 100, tol = 0.001)
PDMIFLING(X, Y, Membership, NGfactors, NLfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Membership |
A pre-specified group membership. |
NGfactors |
A pre-specified number of common factors across groups (see example). |
NLfactors |
A pre-specified number of factors in each groups (see example). |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
GlobalFactors: The estimated common factors across groups.
GlobalLoadings: The estimated factor loadings for the common factors.
GroupFactors: The estimated group-specific factors.
GroupLoadings: The estimated factor loadings for each group.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2015) Asset Pricing with a General Multifactor Structure Journal of Financial Econometrics, 13, 556-604.
fit <- PDMIFLING(data4X,data4Y,data4LAB,2,c(2,2,2),30,0.1)
fit <- PDMIFLING(data4X,data4Y,data4LAB,2,c(2,2,2),30,0.1)
This function estimates heterogeneous logistic panel data models with interactive effects.
PDMIFLOGIT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
PDMIFLOGIT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T., Bai, J. and Li, K. (2021) Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity, Journal of Econometrics.
fit <- PDMIFLOGIT(data2X,data2Y,2,20,0.5)
fit <- PDMIFLOGIT(data2X,data2Y,2,20,0.5)
This function estimates heterogeneous probit panel data models with interactive effects.
PDMIFPROBIT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
PDMIFPROBIT(X, Y, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated factor loadings for the common factors.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T., Bai, J. and Li, K. (2021) Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity, Journal of Econometrics.
fit <- PDMIFPROBIT(data2X,data2Y,2,20,0.5)
fit <- PDMIFPROBIT(data2X,data2Y,2,20,0.5)
This function estimates heterogeneous quantile panel data models with interactive effects.
PDMIFQUANTILE(X, Y, TAU, Nfactors, Maxit = 100, tol = 0.001)
PDMIFQUANTILE(X, Y, TAU, Nfactors, Maxit = 100, tol = 0.001)
X |
The (NT) times p design matrix, without an intercept where N=number of individuals, T=length of time series, p=number of explanatory variables. |
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
TAU |
A pre-specified quantile point. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated quantile point under a given tau.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2020) Quantile co-movement in financial markets Journal of the American Statistical Association.
fit <- PDMIFQUANTILE(data7X,data7Y,0.95,2,10,0.8)
fit <- PDMIFQUANTILE(data7X,data7Y,0.95,2,10,0.8)
This function estimates heterogeneous quantile panel data VAR models with interactive effects.
PDMIFQVAR(Y, LAG, TAU, Nfactors, Maxit = 100, tol = 0.001)
PDMIFQVAR(Y, LAG, TAU, Nfactors, Maxit = 100, tol = 0.001)
Y |
The T times N panel of response where N=number of individuals, T=length of time series. |
LAG |
The number of lags from y_t-1 to y_t-LAG used in the VAR. |
TAU |
A pre-specified quantile point. |
Nfactors |
A pre-specified number of common factors. |
Maxit |
A maximum number of iterations in optimization. Default is 100. |
tol |
Tolerance level of convergence. Default is 0.001. |
A list with the following components:
Coefficients: The estimated heterogeneous coefficients.
Lower05: Lower end (5%) of the 90% confidence interval of the regression coefficients.
Upper95: Upper end (95%) of the 90% confidence interval of the regression coefficients.
Factors: The estimated common factors across groups.
Loadings: The estimated quantile point under a given tau.
Predict: The conditional expectation of response variable.
pval: p-value for testing hypothesis on heterogeneous coefficients.
Se: Standard error of the estimated regression coefficients.
Ando, T. and Bai, J. (2020) Quantile co-movement in financial markets Journal of the American Statistical Association.
fit <- PDMIFQVAR(data8Y,2,0.1,2,5,0.8)
fit <- PDMIFQVAR(data8Y,2,0.1,2,5,0.8)