Generalized Linear Model (GLM)

Introduction

generalized linear models (glm) estimate regression models for outcomes following exponential distributions. in addition to the gaussian (i.e. normal) distribution, these include poisson, binomial, and gamma distributions. each serves a different purpose, and depending on distribution and link function choice, can be used either for prediction or classification.

the glm suite includes:

  • Gaussian regression
  • Poisson regression
  • Binomial regression (classification)
  • Quasibinomial regression
  • Multinomial classification
  • Gamma regression
  • Ordinal regression
  • Negative Binomial regression
  • Tweedie distribution

Defining a GLM Model

  • model_id宁夏微乐滑水麻将下载: (Optional) Specify a custom name for the model to use as a reference. By default, H2O automatically generates a destination key.

  • training_frame: (Required) Specify the dataset used to build the model. NOTE: In Flow, if you click the Build a model button from the Parse cell, the training frame is entered automatically.

  • validation_frame: (Optional) Specify the dataset used to evaluate the accuracy of the model.

  • nfolds: Specify the number of folds for cross-validation.

  • seed: Specify the random number generator (RNG) seed for algorithm components dependent on randomization. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations.

  • y: (Required) Specify the column to use as the dependent variable.

    • For a regression model, this column must be numeric (Real or Int).
    • For a classification model, this column must be categorical (Enum or String). If the family is Binomial, the dataset cannot contain more than two levels.
  • x: Specify a vector containing the names or indices of the predictor variables to use when building the model. If x is missing, then all columns except y are used.

  • keep_cross_validation_predictions: Specify whether to keep the cross-validation predictions.

  • keep_cross_validation_fold_assignment: Enable this option to preserve the cross-validation fold assignment.

  • fold_assignment: (Applicable only if a value for nfolds is specified and fold_column宁夏微乐滑水麻将下载 is not specified) Specify the cross-validation fold assignment scheme. The available options are AUTO (which is Random), Random, , or Stratified (which will stratify the folds based on the response variable for classification problems).

  • fold_column宁夏微乐滑水麻将下载: Specify the column that contains the cross-validation fold index assignment per observation.

  • ignored_columns: (Optional, Python and Flow only) Specify the column or columns to be excluded from the model. In Flow, click the checkbox next to a column name to add it to the list of columns excluded from the model. To add all columns, click the All button. To remove a column from the list of ignored columns, click the X next to the column name. To remove all columns from the list of ignored columns, click the None button. To search for a specific column, type the column name in the Search field above the column list. To only show columns with a specific percentage of missing values, specify the percentage in the Only show columns with more than 0% missing values field. To change the selections for the hidden columns, use the Select Visible or Deselect Visible buttons.

  • random_columns宁夏微乐滑水麻将下载: An array of random columns to be used for HGLM.

  • ignore_const_cols: Enable this option to ignore constant training columns, since no information can be gained from them. This option is enabled by default.

  • score_each_iteration宁夏微乐滑水麻将下载: (Optional) Enable this option to score during each iteration of the model training.

  • offset_column: Specify a column to use as the offset; the value cannot be the same as the value for the weights_column.

    Note宁夏微乐滑水麻将下载: Offsets are per-row “bias values” that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (y) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values.

  • weights_column: Specify a column to use for the observation weights, which are used for bias correction. The specified weights_column must be included in the specified training_frame. Python only: To use a weights column when passing an H2OFrame to x instead of a list of column names, the specified training_frame must contain the specified``weights_column``.

    Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.

  • family: Specify the model type.

    • If the family is gaussian, the response must be numeric (Real or Int). (default)
    • If the family is binomial, the response must be categorical 2 levels/classes or binary (Enum or Int).
    • If the family is multinomial, the response can be categorical with more than two levels/classes (Enum).
    • If the family is ordinal, the response must be categorical with at least 3 levels.
    • If the family is quasibinomial, the response must be numeric.
    • If the family is poisson, the response must be numeric and non-negative (Int).
    • If the family is negativebinomial, the response must be numeric and non-negative (Int).
    • If the family is gamma, the response must be numeric and continuous and positive (Real or Int).
    • If the family is tweedie, the response must be numeric and continuous (Real) and non-negative.
  • rand_family: The Random Component Family specified as an array. You must include one family for each random component. Currently only rand_family={"[gaussisan]"}宁夏微乐滑水麻将下载 is supported.

  • tweedie_variance_power: (Only applicable if Tweedie is specified for Family宁夏微乐滑水麻将下载) Specify the Tweedie variance power.

  • tweedie_link_power: (Only applicable if Tweedie is specified for Family宁夏微乐滑水麻将下载) Specify the Tweedie link power.

  • theta: Theta value (equal to 1/r) for use with the negative binomial family. This value must be > 0 and defaults to 1e-10.

  • solver: Specify the solver to use (AUTO, IRLSM, L_BFGS, COORDINATE_DESCENT_NAIVE, COORDINATE_DESCENT, GRADIENT_DESCENT_LH, or GRADIENT_DESCENT_SQERR). IRLSM is fast on problems with a small number of predictors and for lambda search with L1 penalty, while scales better for datasets with many columns. COORDINATE_DESCENT is IRLSM with the covariance updates version of cyclical coordinate descent in the innermost loop. COORDINATE_DESCENT_NAIVE is IRLSM with the naive updates version of cyclical coordinate descent in the innermost loop. GRADIENT_DESCENT_LH and GRADIENT_DESCENT_SQERR can only be used with the Ordinal family.

  • alpha: Specify the regularization distribution between L1 and L2.

  • lambda: Specify the regularization strength.

  • lambda_search: Specify whether to enable lambda search, starting with lambda max (the smallest \(\lambda\) that drives all coefficients to zero). If you also specify a value for lambda_min_ratio, then this value is interpreted as lambda min. If you do not specify a value for lambda_min_ratio宁夏微乐滑水麻将下载, then GLM will calculate the minimum lambda.

  • early_stopping宁夏微乐滑水麻将下载: Specify whether to stop early when there is no more relative improvement on the training or validation set.

  • nlambdas: (Applicable only if lambda_search is enabled) Specify the number of lambdas to use in the search. The default is 100.

  • standardize宁夏微乐滑水麻将下载: Specify whether to standardize the numeric columns to have a mean of zero and unit variance. Standardization is highly recommended; if you do not use standardization, the results can include components that are dominated by variables that appear to have larger variances relative to other attributes as a matter of scale, rather than true contribution. This option is enabled by default.

  • missing_values_handling: Specify how to handle missing values (Skip, MeanImputation, or PlugValues).

  • plug_values: When missing_values_handling="PlugValues", specify a single row frame containing values that will be used to impute missing values of the training/validation frame.

  • compute_p_values: Request computation of p-values. Only applicable with no penalty (lambda = 0 and no beta constraints). Setting remove_collinear_columns is recommended. H2O will return an error if p-values are requested and there are collinear columns and remove_collinear_columns flag is not enabled. Note that this option is not available for family="multinomial" or family="ordinal".

  • remove_collinear_columns: Specify whether to automatically remove collinear columns during model-building. When enabled, collinear columns will be dropped from the model and will have 0 coefficient in the returned model. This can only be set if there is no regularization (lambda=0).

  • intercept宁夏微乐滑水麻将下载: Specify whether to include a constant term in the model. This option is enabled by default.

  • non_negative宁夏微乐滑水麻将下载: Specify whether to force coefficients to have non-negative values.

  • max_iterations: Specify the number of training iterations.

  • objective_epsilon宁夏微乐滑水麻将下载: Specify a threshold for convergence. If the objective value is less than this threshold, the model is converged.

  • beta_epsilon: Specify the beta epsilon value. If the L1 normalization of the current beta change is below this threshold, consider using convergence.

  • gradient_epsilon: (For L-BFGS only) Specify a threshold for convergence. If the objective value (using the L-infinity norm) is less than this threshold, the model is converged.

  • link: Specify a link function (Identity, Family_Default, Logit, Log, Inverse, Tweedie, or Ologit).

    • If the family is Gaussian, then Identity, Log, and Inverse are supported.
    • If the family is Binomial, then Logit is supported.
    • If the family is Poisson, then Log and Identity are supported.
    • If the family is Gamma, then Inverse, Log, and Identity are supported.
    • If the family is Tweedie, then only Tweedie is supported.
    • If the family is Multinomial, then only Family_Default is supported. (This defaults to multinomial.)
    • If the family is Quasibinomial, then only Logit is supported.
    • If the family is Ordinal, then only Ologit is supported
    • If the family is Negative Binomial, then only Log and Identity are supported.
  • rand_link: The link function for random component in HGLM specified as an array. Available options include identity and family_default.

  • startval: The initial starting values for fixed and randomized coefficients in HGLM specified as a double array.

  • calc_like: Specify whether to return likelihood function value for HGLM. This is disabled by default.

  • hglm: If enabled, then an HGLM model will be built; if disabled (default), then a GLM mdoel will be built.

  • prior宁夏微乐滑水麻将下载: Specify prior probability for p(y==1). Use this parameter for logistic regression if the data has been sampled and the mean of response does not reflect reality. This value defaults to -1 and must be a value in the range (0,1).

    Note: This is a simple method affecting only the intercept. You may want to use weights and offset for a better fit.

  • lambda_min_ratio: Specify the minimum lambda to use for lambda search (specified as a ratio of lambda_max, which is the smallest \(\lambda\) for which the solution is all zeros).

  • beta_constraints: Specify a dataset to use beta constraints. The selected frame is used to constrain the coefficient vector to provide upper and lower bounds. The dataset must contain a names column with valid coefficient names.

  • max_active_predictors宁夏微乐滑水麻将下载: Specify the maximum number of active predictors during computation. This value is used as a stopping criterium to prevent expensive model building with many predictors.

  • interactions: Specify a list of predictor column indices to interact. All pairwise combinations will be computed for this list.

  • interaction_pairs: When defining interactions, use this option to specify a list of pairwise column interactions (interactions between two variables). Note that this is different than interactions, which will compute all pairwise combinations of specified columns.

  • obj_reg宁夏微乐滑水麻将下载: Specifies the likelihood divider in objective value computation. This defaults to 1/nobs.

  • export_checkpoints_dir: Specify a directory to which generated models will automatically be exported.

Interpreting a GLM Model

by default, the following output displays:

  • Model parameters (hidden)
  • A bar chart representing the standardized coefficient magnitudes (blue for negative, orange for positive). Note that this only displays is standardization is enabled.
  • A graph of the scoring history (objective vs. iteration)
  • Output (model category, validation metrics, and standardized coefficients magnitude)
  • GLM model summary (family, link, regularization, number of total predictors, number of active predictors, number of iterations, training frame)
  • Scoring history in tabular form (timestamp, duration, iteration, log likelihood, objective)
  • Training metrics (model, model checksum, frame, frame checksum, description, model category, scoring time, predictions, MSE, r2, residual deviance, null deviance, AIC, null degrees of freedom, residual degrees of freedom)
  • Coefficients
  • Standardized coefficient magnitudes (if standardization is enabled)

Classification and Regression

glm can produce two categories of models: classification and regression. logistic regression is the glm performing binary classification.

Handling of Categorical Variables

宁夏微乐滑水麻将下载glm supports both binary and multinomial classification. for binary classification, the response column can only have two levels; for multinomial classification, the response column will have more than two levels. we recommend letting glm handle categorical columns, as it can take advantage of the categorical column for better performance and memory utilization.

we strongly recommend avoiding one-hot encoding categorical columns with any levels into many binary columns, as this is very inefficient. this is especially true for python users who are used to expanding their categorical variables manually for other frameworks.

Handling of Numeric Variables

When GLM performs regression (with factor columns), one category can be left out to avoid multicollinearity. If regularization is disabled (lambda = 0), then one category is left out. However, when using a the default lambda parameter, all categories are included.

the reason for the different behavior with regularization is that collinearity is not a problem with regularization. and it’s better to leave regularization to find out which level to ignore (or how to distribute the coefficients between the levels).

Hierarchical GLM

introduced in 3.28.0.1, hierarchical glm (hglm) fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, gaussian). hglm allows you to specify both fixed and random effects, which allows fitting correlated to random effects as well as random regression models. hglm can be used for linear mixed models and for generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.

Note宁夏微乐滑水麻将下载: The initial release of HGLM supports only the Gaussian family and random family.

Gaussian Family and Random Family in HGLM

宁夏微乐滑水麻将下载to build an hglm, we need the hierarchical log-likelihood (h-likelihood) function. the h-likelihood function can be expressed as (equation 1):

\[h(\beta, \theta, u) = \log(f (y|u)) + \log (f(u))\]

for fixed effects \(\beta\), variance components \(\theta\), and random effects \(u\).

a standard linar mixed model can be expressed as (equation 2):

\[y = X\beta + Zu + e\]

where

  • \(e \text ~ N(0, I_n, \delta_e^2), u \text ~ N(0, I_k, \delta_u^2)\)
  • \(e, u\) are independent, and \(u\) represents the random effects
  • \(n\) is the number of i.i.d observations of \(y\) with mean \(0\)
  • \(q\) is the number of values \(Z\) can take

Then rewriting equation 2 as \(e = X\beta + Zu - y\) and derive the h-likelihood as:

../_images/h-likelihood.png

where \(C_1 = - \frac{n}{2} \log(2\pi), C_2 = - \frac{q}{2} \log(2\pi)\)

宁夏微乐滑水麻将下载in principal, the hglm model building involves the following main steps:

  1. Set the initial values to \(\delta_u^2, \delta_e^2, u, \beta\)
  2. Estimate the fixed (\(\beta\)) and random effects (\(u\)) by solving for \(\frac{\partial h}{\partial \beta} = 0, \frac{\partial h}{\partial u} = 0\)
  3. Estimate variance components using the adjusted profile likelihood:
\[h_p = \big(h + \frac{1}{2} log \big| 2 \pi D^{-1}\big| \big)_{\beta=\hat \beta, u=\hat u}\]

宁夏微乐滑水麻将下载and solving for

\[\frac{\partial h_p}{\partial \theta} = 0\]

Note that \(D\) is the matrix of the second derivatives of \(h\) around \(\beta = \hat \beta, u = \hat u, \theta = (\delta_u^2, \delta_e^2)\).

H2O Implementation

宁夏微乐滑水麻将下载in reality, lee and nelder (see references) showed that linear mixed models can be fitted using a hierarchy of glm by using an augmented linear model. the linear mixed model will be written as:

\[\begin{split}y = X\beta + Zu + e \\ v = ZZ^T\sigma_u^2 + R\sigma_e^2\end{split}\]

where \(R\) is a diagonal matrix with elements given by the estimated dispersion model. The dispersion model refers to the variance part of the fixed effect model with error \(e\). There are cases where the dispersion model is modeled itself as \(exp(x_d, \beta_d)\). However, in our current version, the variance is just a constant \(\sigma_e^2\), and hence \(R\) is just a scalar value. It is initialized to be the identity matrix. The model can be written as an augmented weighted linear model:

\[y_a = T_a \delta + e_a\]

where

../_images/hglm_augmentation.png

Note that \(q\) is the number of columns in \(Z, 0_q\) is a vector of \(q\) zeroes, \(I_q\) is the \(qxq\) identity matrix. The variance-covariance matrix of the augmented residual matrix is

../_images/hglm_variance_covariance.png

Fixed and Random Coefficients Estimation

The estimates for \(\delta\) from weighted least squares are given by solving

\[T_a^T W^{-1} T_a \delta=T_a^T W^{-1} y_a\]

where

\[W= V(e_a )\]

The two variance components are estimated iteratively by applying a gamma GLM to the residuals \(e_i^2,u_i^2\). Because we are not using a dispersion model, there is only an intercept terms in the linear predictors. The leverages \(h_i\) for these models are calculated from the diagonal elements of the hat matrix:

\[H_a=T_a (T_a^T W^{-1} T_a )^{-1} T_a^T W^{-1}\]

Estimation of Fixed Effect Dispersion Parameter/Variance

A gamma GLM is used to fit the dispersion part of the model with response \(y_{d,i}=(e_i^2)⁄(1-h_i )\) where \(E(y_d )=u_d\) and \(u_d≡\phi\) (i.e., \(\delta_e^2\) for a Gaussian response). The GLM model for the dispersion parameter is then specified by the link function \(g_d (.)\) and the linear predictor \(X_d \beta_d\) with prior weights for \((1-h_i )⁄2\) for \(g_d (u_d )=X_d \beta_d\). Because we are not using a dispersion model, \(X_d \beta_d\)宁夏微乐滑水麻将下载 will only contain the intercept term.

Estimation of Random Effect Dispersion Parameter/Variance

Similarly, a gamma GLM is fitted to the dispersion term \(alpha\) (i.e., \(\delta_e^2\) for a GLM) for the random effect \(v\), with \(y_\alpha,j = u_j^2⁄(1-h_{n+j}), j=1,2,…,q\) and \(g_\alpha (u_\alpha )=\lambda\), where the prior weights are \((1-h_{n+j} )⁄2\), and the estimated dispersion term for the random effect is given by \(\hat \alpha = g_α^{-1}(\hat \lambda)\).

Fitting Algorithm Overview

The following fitting algorithm from “Generalized linear models with random effects” (Y. Lee, J. A. Nelder and Y. Pawitan; see References) is used to build our HGLM. Let \(n\) be the number of observations and \(k\)宁夏微乐滑水麻将下载 be the number of levels in the random effect. The algorithm that was implemented here at H2O will perform the following:

  1. Initialize starting values either from user by setting parameter startval or by the system if startval is left unspecified.
  2. Construct an augmented model with response \(y_{aug}= {y \choose {E(u)}}\).
  3. Use a GLM to estimate \(\delta={\beta \choose u}\) given the dispersion \(\phi\) and \(\lambda\). Save the deviance components and leverages from the fitted model.
  4. Use a gamma GLM to estimate the dispersion parameter for \(\phi\) (i.e. \(\delta_e^2\) for a Gaussian response).
  5. Use a similar GLM as in step 4 to estimate \(\lambda\) from the last \(k\) deviance components and leverages obtained from the GLM in step 3.
  6. Iterate between steps 3-5 until convergence. Note that the convergence measure here is either a timeout event or the following condition has been met: \(\frac {\Sigma_i{(\text{eta}. i - \text{eta}.o)^2}} {\Sigma_i(\text{eta}.i)^2 \text{<} 1e - 6}\).

a timeout event can be defined as the following:

  1. Maximum number of iterations have been reached
  2. Model building run time exceeds what is specified in max_runtime_secs
  3. A user has clicked on stop model button or similar from Flow.

for families and random families other than gaussian, link functions are used to translate from the linear space to the model the mean output.

Linear Mixed Model with Correlated Random Effect

Let \(A\) be a matrix with known elements that describe the correlation among the random effects. The model is now given by:

../_images/hglm_linear_mixed_model1.png

where \(N\) is normal distribution and \(MVN\)宁夏微乐滑水麻将下载 is multi-variable normal. This can be easily translated to:

../_images/hglm_linear_mixed_model2.png

where \(Z^* = ZL\) and \(L\) is the Cholesky factorization of \(A\). Hence, if you have correlated random effects, you can first perform the transformation to your data before using our HGLM implementation here.

HGLM Model Metrics

h2o provides the following model metrics at the end of each hglm experiment:

  • 宁夏微乐滑水麻将下载fixef: fixed effects coefficients

  • ranef: random effects coefficients

  • randc: vector of random column indices

  • varfix: dispersion parameter of the mean model

  • 宁夏微乐滑水麻将下载varranef: dispersion parameter of the random effects

  • 宁夏微乐滑水麻将下载converge: true if algorithm has converge, otherwise false

  • 宁夏微乐滑水麻将下载sefe: standard errors of fixed effects

  • 宁夏微乐滑水麻将下载sere: standard errors of random effects

  • dfrefe: deviance degrees of freedom for the mean part of model

  • sumvc1: estimates and standard errors of linear predictor in the dispersion model

  • summvc2: estimates and standard errors of the linear predictor for the dispersion parameter of the random effects

  • likelihood: if calc_like宁夏微乐滑水麻将下载 is true, the following four values are returned:

    • hlik: log-h-likelihood;
    • pvh: adjusted profile log-likelihood profiled over the random effects;
    • pbvh: adjusted profile log-likelihood profiled over fixed and random effects;
    • caic: conditional AIC.
  • bad: row index of the most influential observation.

Mapping of Fitting Algorithm to the H2O-3 Implementation

宁夏微乐滑水麻将下载this mapping is done in four steps:

  1. Initialize starting values by the system.
  2. Estimate \(\delta =\) \(\beta \choose u\).
  3. Estimate \(\delta_e^2(\text {tau})\).
  4. Estimate \(\delta_u^2(\text {phi})\).

Step 1宁夏微乐滑水麻将下载: Initialize starting values by the system.

Following the implementation from R, when a user fails to specify starting values for psi, \(\beta\), \(\mu\), \(\delta_e^2\), \(\delta_u^2\), we will do it for the users as follows:

  1. A GLM model is built with just the fixed columns and response.
  2. Next init_sig_e(\(\delta_e^2\))/tau is set to 0.6*residual_deviance()/residual_degrees_of_freedom().
  3. init_sig_u(\(\delta_u^2\)) is set to 0.66*init_sig_e.
  4. For numerical stability, we restrict the magnitude to init_sig_e and init_sig_u to >= 0.1.
  5. Set phi = vector of length number of random columns of value init_sig_u/(number of random columns).
  6. Set \(\beta\) to the GLM model coefficients, \(\mu\) to be a zero vector.
  7. Set psi to be a zero vector.

Step 2: Estimate \(\delta =\) \(\beta \choose u\).

Given the current values of \(\delta_e^2, \delta_u^2\), we will solve for \(\delta =\) \(\beta \choose u\). Instead of solving \(\delta\) from \(T_a^T W^{-1} T_a \delta=T_a^T W^{-1} y_a\)宁夏微乐滑水麻将下载, a different set of formulae are used. A loop is used to solve for the coefficients:

  1. The following variables are generated:
  • \(v.i= g_r^{-1} (u_i)\) where \(u_i\) are the random coefficients of the random effects/columns and \(g_r^{-1}\) can be considered as the inverse link function.
  • \(tau\) is a vector of length number of data containing init.sig.e;
  • \(eta.i=X_i \beta+offset\) and store the previous \(eta.i\) as \(eta.o\).
  • \(mu.i=g^{-1} (eta.i)\).
  • dmu_deta is derivative of \(g^{-1} (eta.i)\) with respect to \(eta.i\), which is 1 for identity link.
  • \(z_i=eta.i-offset+(y_i-mu.i)/\text {dmu_deta}\)
  • \(zmi= \text{psi}\)
  • \(augZ =\) \(zi \choose zmi\).
  • du_dv is the derivative of \(g_r^{-1} (u_i)\) with respect to \(v.i.\) Again, for identity link, this is 1.
  • The weight \(W =\) \(wdata \choose wpsi\) where \(wdata = \frac {d \text{mu_deta}^2}{\text {prior_weight*family}\$\text{variance}(mu.i)*tau}\) and \(wpsi = \frac {d \text{u_dv}^2}{\text {prior_weight*family}\$\text{variance(psi)*phi}}\)
  1. Finally the following formula is used to solve for the parameters: \(augXZ \cdot \delta=augZW\) where \(augXZ=T_a \cdot W\) and \(augZW=augZ \cdot W\):
  • Use QR decomposition to augXZ and obtain: \(QR \delta = augZW\).
  • Use backward solve to obtain the coefficients \(\delta\) from \(R \delta = Q^T augZW\).
  • Calculate \(hv=\text{rowsum}(Q)\) of length n+number of expanded and store in returnFrame.
  • Calculate \(dev =\) \(prior weight*(y_i-mu.i)^2 \choose (psi -u_i )^2\) of length n+number of expanded random columns and store in returnFrame.
  • Calculate \(resid= \frac {(y-mu.i)} {\sqrt \frac {sum(dev)(1-hv)}{n-p}}\) of length n and store in returnFrame.
  • Go back to step 1 unless \(\Sigma_i(eta.i-eta.o)^2 / \Sigma_i(eta.i)^2<1e-6\) or a timeout event has occurred.

Step 3: Estimate \(\delta_e^2(\text {tau})\)

宁夏微乐滑水麻将下载with the newly estimated fixed and random coefficients, we will estimate the dispersion parameter for the fixed effects/columns by building a gamma glm:

  1. Generate a training frame with constant predictor column of 1 to force glm model to generate only the intercept term:
  • Response column as \(dev/(1-hv)\).
  • Weight column as \((1-hv)/2\).
  • Predictor column of ones.
  • The length of the training frame is the number of data rows.
  1. Build a gamma GLM with family=gamma and link=log.
  2. Set \(tau = \text {exp (intercept value)}\).
  3. Assign estimation standard error and sigma from the GLM standard error calculation for coefficients.

Step 4: Estimate \(\delta_u^2(\text {phi})\).

Again, a gamma GLM model is used here. In addition, the error estimates are generated for each random column. Exactly the same steps are used here as in Step 3. The only difference is that we are looking at the \(dev,hv\) corresponding to the expanded random columns/effects.

Regularization

Regularization is used to attempt to solve problems with overfitting that can occur in GLM. Penalties can be introduced to the model building process to avoid overfitting, to reduce variance of the prediction error, and to handle correlated predictors. The two most common penalized models are ridge regression and LASSO (least absolute shrinkage and selection operator). The elastic net combines both penalties using both the alpha and lambda options (i.e., values greater than 0 for both).

LASSO and Ridge Regression

LASSO represents the \(\ell{_1}\) penalty and is an alternative regularized least squares method that penalizes the sum of the absolute coefficents \(||\beta||{_1} = \sum{^p_{k=1}} \beta{^2_k}\). LASSO leads to a sparse solution when the tuning parameter is sufficiently large. As the tuning parameter value \(\lambda\) is increased, all coefficients are set to zero. Because reducing parameters to zero removes them from the model, LASSO is a good selection tool.

Ridge regression penalizes the \(\ell{_2}\) norm of the model coefficients \(||\beta||{^2_2} = \sum{^p_{k=1}} \beta{^2_k}\)宁夏微乐滑水麻将下载. It provides greater numerical stability and is easier and faster to compute than LASSO. It keeps all the predictors in the model and shrinks them proportionally. Ridge regression reduces coefficient values simultaneously as the penalty is increased without setting any of them to zero.

Variable selection is important in numerous modern applications wiht many covariates where the \(\ell{_1}\) penalty has proven to be successful. Therefore, if the number of variables is large or if the solution is known to be sparse, we recommend using LASSO, which will select a small number of variables for sufficiently high \(\lambda\) that could be crucial to the inperpretability of the mode. The \(\ell{_2}\) norm does not have this effect; it shrinks the coefficients but does not set them exactly to zero.

The two penalites also differ in the presence of correlated predictors. The \(\ell{_2}\) penalty shrinks coefficients for correlated columns toward each other, while the \(\ell{_1}\) penalty tends to select only one of them and sets the other coefficients to zero. Using the elastic net argument \(\alpha\) combines these two behaviors.

The elastic net method selects variables and preserves the grouping effect (shrinking coefficients of correlated columns together). Moreover, while the number of predictors that can enter a LASSO model saturates at min \((n,p)\) (where \(n\) is the number of observations, and \(p\) is the number of variables in the model), the elastic net does not have this limitation and can fit models with a larger number of predictors.

Elastic Net Penalty

As indicated previously, elastic net regularization is a combination of the \(\ell{_1}\) and \(\ell{_2}\) penalties parametrized by the \(\alpha\) and \(\lambda\) arguments (similar to “Regularization Paths for Genarlized Linear Models via Coordinate Descent” by Friedman et all).

  • \(\alpha\) controls the elastic net penalty distribution between the \(\ell_1\) and \(\ell_2\) norms. It can have any value in the [0,1] range or a vector of values (via grid search). If \(\alpha=0\), then H2O solves the GLM using ridge regression. If \(\alpha=1\), then LASSO penalty is used.
  • \(\lambda\) controls the penalty strength. The range is any positive value or a vector of values (via grid search). Note that \(\lambda\) values are capped at \(\lambda_{max}\), which is the smallest \(\lambda\) for which the solution is all zeros (except for the intercept term).

The combination of the \(\ell_1\) and \(\ell_2\) penalties is beneficial because \(\ell_1\) induces sparsity, while \(\ell_2\) gives stability and encourages the grouping effect (where a group of correlated variables tend to be dropped or added into the model simultaneously). When focusing on sparsity, one possible use of the \(\alpha\) argument involves using the \(\ell_1\) mainly with very little \(\ell_2\) (\(\alpha\) almost 1) to stabilize the computation and improve convergence speed.

Regularization Parameters in GLM

To get the best possible model, we need to find the optimal values of the regularization parameters \(\alpha\) and \(\lambda\). To find the optimal values, H2O allows you to perform a grid search over \(\alpha\) and a special form of grid search called “lambda search” over \(\lambda\).

The recommended way to find optimal regularization settings on H2O is to do a grid search over a few \(\alpha\) values with an automatic lambda search for each \(\alpha\).

  • Alpha
The alpha parameter controls the distribution between the \(\ell{_1}\) (LASSO) and \(\ell{_2}\) (ridge regression) penalties. A value of 1.0 for alpha represents LASSO, and an alpha value of 0.0 produces ridge reguression.
  • Lambda
The lambda parameter controls the amount of regularization applied. If lambda is 0.0, no regularization is applied, and the alpha parameter is ignored. The default value for lambda is calculated by H2O using a heuristic based on the training data. If you allow H2O to calculate the value for lambda, you can see the chosen value in the model output.

Full Regularization Path

it can sometimes be useful to see the coefficients for all lambda values or to override default lambda selection. full regularization path can be extracted from both r and python clients (currently not from flow). it returns coefficients (and standardized coefficients) for all computed lambda values and also the explained deviances on both train and validation. subsequently, the makeglmmodel call can be used to create an h2o glm model with selected coefficients.

宁夏微乐滑水麻将下载to extract the regularization path from r or python:

  • R: call h2o.getGLMFullRegularizationPath. This takes the model as an argument. An example is available .
  • Python: H2OGeneralizedLinearEstimator.getGLMRegularizationPath (static method). This takes the model as an argument. An example is available .

Solvers

宁夏微乐滑水麻将下载this section provides general guidelines for best performance from the glm implementation details. the optimal solver depends on the data properties and prior information regarding the variables (if available). in general, the data are considered sparse if the ratio of zeros to non-zeros in the input matrix is greater than 10. the solution is sparse when only a subset of the original set of variables is intended to be kept in the model. in a dense solution, all predictors have non-zero coefficients in the final model.

宁夏微乐滑水麻将下载in glm, you can specify one of the following solvers:

  • IRLSM: Iteratively Reweighted Least Squares Method (default)
  • L_BFGS: Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm
  • AUTO: Sets the solver based on given data and parameters.
  • COORDINATE_DESCENT: Coordinate Decent (not available when family=multinomial)
  • COORDINATE_DESCENT_NAIVE: Coordinate Decent Naive
  • GRADIENT_DESCENT_LH: Gradient Descent Likelihood (available for Ordinal family only; default for Ordinal family)
  • GRADIENT_DESCENT_SQERR: Gradient Descent Squared Error (available for Ordinal family only)

IRLSM and L-BFGS

IRLSM (the default) uses a approach, which is efficient for tall and narrow datasets and when running lambda search via a sparse solution. For wider and dense datasets (thousands of predictors and up), the L-BFGS solver scales better. If there are fewer than 500 predictors (or so) in the data, then use the default solver (IRLSM). For larger numbers of predictors, we recommend running IRLSM with a lambda search, and then comparing it to L-BFGS with just one \(\ell_2\) penalty. For advanced users, we recommend the following general guidelines:

  • For a dense solution and a dense dataset, use IRLSM if there are fewer than 500 predictors in the data; otherwise, use L-BFGS. Set alpha=0 to include \(\ell_2\) regularization in the elastic net penalty term to avoid inducing sparsity in the model.
  • For a dense solution with a sparse dataset, use IRLSM if there are fewer than 2000 predictors in the data; otherwise, use L-BFGS. Set alpha=0.
  • For a sparse solution with a dense dataset, use IRLSM with lambda_search=TRUE if fewer than 500 active predictors in the solution are expected; otherwise, use L-BFGS. Set alpha to be greater than 0 to add in an \(\ell_1\) penalty to the elastic net regularization, which induces sparsity in the estimated coefficients.
  • For a sparse solution with a sparse dataset, use IRLSM with lambda_search=TRUE if you expect less than 5000 active predictors in the solution; otherwise, use L-BFGS. Set alpha to be greater than 0.

if you are unsure whether the solution should be sparse or dense, try both along with a grid of alpha values. the optimal model can be picked based on its performance on the validation data (or alternatively, based on the performance in cross-validation when not enough data is available to have a separate validation dataset).

Coordinate Descent

in addition to irlsm and l-bfgs, h2o’s glm includes options for specifying coordinate descent. cyclical coordinate descent is able to handle large datasets well and deals efficiently with sparse features. it can improve the performance when the data contains categorical variables with a large number of levels, as it is implemented to deal with such variables in a parallelized way.

  • Coordinate Descent is IRLSM with the covariance updates version of cyclical coordinate descent in the innermost loop. This version is faster when \(N > p\) and \(p\) ~ \(500\).
  • Coordinate Descent Naive is IRLSM with the naive updates version of cyclical coordinate descent in the innermost loop.
  • Coordinate Descent provides much better results if lambda search is enabled. Also, with bounds, it tends to get higher accuracy.
  • Coordinate Descent cannot be used with family=multinomial.

both of the above method are explained in the .

Gradient Descent

For Ordinal regression problems, H2O provides options for . Gradient Descent is a first-order iterative optimization algorithm for finding the minimum of a function. In H2O’s GLM, conventional ordinal regression uses a likelihood function to adjust the model parameters. The model parameters are adjusted by maximizing the log-likelihood function using gradient descent. When the Ordinal family is specified, the solver parameter will automatically be set to GRADIENT_DESCENT_LH. To adjust the model parameters using the loss function, you can set the solver parameter to GRADIENT_DESCENT_SQERR.

Coefficients Table

a coefficients table is outputted in a glm model. this table provides the following information: column names, coefficients, standard error, z-value, p-value, and standardized coefficients.

  • Coefficients are the predictor weights (i.e. the weights used in the actual model used for prediction) in a GLM model.
  • Standard error, z-values, and p-values are classical statistical measures of model quality. p-values are essentially hypothesis tests on the values of each coefficient. A high p-value means that a coefficient is unreliable (insiginificant) while a low p-value suggest that the coefficient is statistically significant.
  • The standardized coefficients are returned if the standardize option is enabled (which is the default). These are the predictor weights of the standardized data and are included only for informational purposes (e.g. to compare relative variable importance). In this case, the “normal” coefficients are obtained from the standardized coefficients by reversing the data standardization process (de-scaled, with the intercept adjusted by an added offset) so that they can be applied to data in its original form (i.e. no standardization prior to scoring). Note: These are not the same as coefficients of a model built on non-standardized data.

Extracting Coefficients Table Information

You can extract the columns in the Coefficients Table by specifying names, coefficients, std_error, z_value, p_value, standardized_coefficients in a retrieve/print statement. (Refer to the example that follows.) In addition, H2O provides the following built-in methods for retrieving standard and non-standard coefficients:

  • coef(): Coefficients that can be applied to non-standardized data
  • coef_norm(): Coefficients that can be fitted on the standardized data (requires standardized=TRUE, which is the default)

Example

library(h2o)
h2o.init()

df <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
df$CAPSULE <- as.factor(df$CAPSULE)
df$RACE <- as.factor(df$RACE)
df$DCAPS <- as.factor(df$DCAPS)
df$DPROS <- as.factor(df$DPROS)

predictors <- c("AGE", "RACE", "VOL", "GLEASON")
response <- "CAPSULE"

prostate.glm <- h2o.glm(family= "binomial", x= predictors, y=response, training_frame=df, lambda = 0, compute_p_values = TRUE)

# Coefficients that can be applied to the non-standardized data
h2o.coef(prostate.glm)
  Intercept      RACE.1      RACE.2         AGE         VOL     GLEASON
-6.67515539 -0.44278752 -0.58992326 -0.01788870 -0.01278335  1.25035939

# Coefficients fitted on the standardized data (requires standardize=TRUE, which is on by default)
h2o.coef_norm(prostate.glm)
  Intercept      RACE.1      RACE.2         AGE         VOL     GLEASON
-0.07610006 -0.44278752 -0.58992326 -0.11676080 -0.23454402  1.36533415

# Print the coefficients table
prostate.glm@model$coefficients_table
Coefficients: glm coefficients
      names coefficients std_error   z_value  p_value standardized_coefficients
1 Intercept    -6.675155  1.931760 -3.455478 0.000549                 -0.076100
2    RACE.1    -0.442788  1.324231 -0.334373 0.738098                 -0.442788
3    RACE.2    -0.589923  1.373466 -0.429514 0.667549                 -0.589923
4       AGE    -0.017889  0.018702 -0.956516 0.338812                 -0.116761
5       VOL    -0.012783  0.007514 -1.701191 0.088907                 -0.234544
6   GLEASON     1.250359  0.156156  8.007103 0.000000                  1.365334

# Print the standard error
prostate.glm@model$coefficients_table$std_error
[1] 1.931760363 1.324230832 1.373465793 0.018701933 0.007514354 0.156156271

# Print the p values
prostate.glm@model$coefficients_table$p_value
[1] 5.493181e-04 7.380978e-01 6.675490e-01 3.388116e-01 8.890718e-02
[6] 1.221245e-15

# Print the z values
prostate.glm@model$coefficients_table$z_value
[1] -3.4554780 -0.3343734 -0.4295143 -0.9565159 -1.7011907  8.0071033

# Retrieve a graphical plot of the standardized coefficient magnitudes
h2o.std_coef_plot(prostate.glm)
import h2o
h2o.init()
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

prostate = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
prostate['CAPSULE'] = prostate['CAPSULE'].asfactor()
prostate['RACE'] = prostate['RACE'].asfactor()
prostate['DCAPS'] = prostate['DCAPS'].asfactor()
prostate['DPROS'] = prostate['DPROS'].asfactor()

predictors = ["AGE", "RACE", "VOL", "GLEASON"]
response_col = "CAPSULE"

glm_model = H2OGeneralizedLinearEstimator(family= "binomial", lambda_ = 0, compute_p_values = True)
glm_model.train(predictors, response_col, training_frame= prostate)

# Coefficients that can be applied to the non-standardized data.
print(glm_model.coef())
{u'GLEASON': 1.2503593867263176, u'VOL': -0.012783348665664449, u'AGE': -0.017888697161812357, u'Intercept': -6.6751553940827195, u'RACE.2': -0.5899232636956354, u'RACE.1': -0.44278751680880707}

# Coefficients fitted on the standardized data (requires standardize = True, which is on by default)
print(glm_model.coef_norm())
{u'GLEASON': 1.365334151581163, u'VOL': -0.2345440232267344, u'AGE': -0.11676080128780757, u'Intercept': -0.07610006436753876, u'RACE.2': -0.5899232636956354, u'RACE.1': -0.44278751680880707}

# Print the Coefficients table
glm_model._model_json['output']['coefficients_table']
Coefficients: glm coefficients
names      coefficients    std_error    z_value    p_value      standardized_coefficients
---------  --------------  -----------  ---------  -----------  ---------------------------
Intercept  -6.67516        1.93176      -3.45548   0.000549318  -0.0761001
RACE.1     -0.442788       1.32423      -0.334373  0.738098     -0.442788
RACE.2     -0.589923       1.37347      -0.429514  0.667549     -0.589923
AGE        -0.0178887      0.0187019    -0.956516  0.338812     -0.116761
VOL        -0.0127833      0.00751435   -1.70119   0.0889072    -0.234544
GLEASON    1.25036         0.156156     8.0071     1.22125e-15  1.36533

# Print the Standard error
print(glm_model._model_json['output']['coefficients_table']['std_error'])
[1.9317603626604352, 1.3242308316851008, 1.3734657932878116, 0.01870193337051072, 0.007514353657915356, 0.15615627100850296]

# Print the p values
print(glm_model._model_json['output']['coefficients_table']['p_value'])
[0.0005493180609459358, 0.73809783692024, 0.6675489550762566, 0.33881164088847204, 0.0889071809658667, 1.2212453270876722e-15]

# Print the z values
print(glm_model._model_json['output']['coefficients_table']['z_value'])
[-3.4554779791058787, -0.3343733631736653, -0.42951434726559384, -0.9565159284557886, -1.7011907141473064, 8.007103260414265]

# Retrieve a graphical plot of the standardized coefficient magnitudes
glm_model.std_coef_plot()

Modifying or Creating a Custom GLM Model

宁夏微乐滑水麻将下载in r and python, the makeglmmodel call can be used to create an h2o model from given coefficients. it needs a source glm model trained on the same dataset to extract the dataset information. to make a custom glm model from r or python:

  • R: call h2o.makeGLMModel. This takes a model, a vector of coefficients, and (optional) decision threshold as parameters.
  • Pyton: H2OGeneralizedLinearEstimator.makeGLMModel (static method) takes a model, a dictionary containing coefficients, and (optional) decision threshold as parameters.

FAQ

  • How does the algorithm handle missing values during training?
Depending on the selected missing value handling policy, they are either imputed mean or the whole row is skipped. The default behavior is Mean Imputation. Note that unseen categorical levels are replaced by the most frequent level present in training (mod). Optionally, GLM can skip all rows with any missing values.
  • How does the algorithm handle missing values during testing?
Same as during training. If the missing value handling is set to Skip and we are generating predictions, skipped rows will have Na (missing) prediction.
  • What happens if the response has missing values?
The rows with missing responses are ignored during model training and validation.
  • What happens during prediction if the new sample has categorical levels not seen in training?
The value will be filled with either 0 or replaced by the most frequent level present in training (if missing_value_handling was set to MeanImputation).
  • How are unseen categorical values treated during scoring?
Unseen categorical levels are treated based on the missing values handling during training. If your missing value handling was set to Mean Imputation, the unseen levels are replaced by the most frequent level present in training (mod). If your missing value treatment was Skip, the variable is ignored for the given observation.
  • Does it matter if the data is sorted?
No.
  • Should data be shuffled before training?
No.
  • How does the algorithm handle highly imbalanced data in a response column?
GLM does not require special handling for imbalanced data.
  • What if there are a large number of columns?
IRLS will get quadratically slower with the number of columns. Try L-BFGS for datasets with more than 5-10 thousand columns.
  • What if there are a large number of categorical factor levels?
GLM internally one-hot encodes the categorical factor levels; the same limitations as with a high column count will apply.
  • When building the model, does GLM use all features or a selection of the best features?
Typically, GLM picks the best predictors, especially if lasso is used (alpha = 1). By default, the GLM model includes an L1 penalty and will pick only the most predictive predictors.
  • When running GLM, is it better to create a cluster that uses many smaller nodes or fewer larger nodes?

宁夏微乐滑水麻将下载a rough heuristic would be:

\(nodes ~=M *N^2/(p * 1e8)\)

where \(M\) is the number of observations, \(N\) is the number of columns (categorical columns count as a single column in this case), and \(p\) is the number of CPU cores per node.

For example, a dataset with 250 columns and 1M rows would optimally use about 20 nodes with 32 cores each (following the formula \(250^2 *1000000/(32* 1e8) = 19.5 ~= 20)\).

  • How is variable importance calculated for GLM?
For GLM, the variable importance represents the coefficient magnitudes.
  • How does GLM define and check for convergence during logistic regression?

glm includes three convergence criteria outside of max iterations:

  • beta_epsilon: beta stops changing. This is used mostly with IRLSM.
  • gradient_epsilon: gradient is too small. This is used mostly with L-BFGS.
  • objective_epsilon: relative objective improvement is too small. This is used by all solvers.

the default values below are based on a heuristic:

  • The default for beta_epsilon is 1e-4.
  • The default for gradient_epsilon is 1e-6 if there is no regularization (lambda = 0) or you are running with lambda_search; 1e-4 otherwise.
  • The default for objective_epsilon is 1e-6 if lambda = 0; 1e-4 otherwise.

The default for max_iterations宁夏微乐滑水麻将下载 depends on the solver type and whether you run with lambda search:

  • for IRLSM, the default is 50 if no lambda search; 10* number of lambdas otherwise
  • for LBFGS, the default is number of classes (1 if not classification) * max(20, number of predictors /4 ) if no lambda search; it is number of classes * 100 * n-lambdas with lambda search.

you will receive a warning if you reach the maximum number of iterations. in some cases, glm can end prematurely if it can not progress forward via line search. this typically happens when running a lambda search with irlsm solver. note that using coordinatedescent solver fixes the issue.

  • Why do I receive different results when I run R’s glm and H2O’s glm?
H2O’s glm and R’s glm do not run the same way and, thus, will provide different results. This is mainly due to the fact that H2O’s glm uses H2O math, H2O objects, and H2O distributed computing. Additionally, H2O’s glm by default adds regularization, so it is essentially solving a different problem.
  • How can I get H2O’s GLM to match R’s `glm()` function?

there are a few arguments you need to set in order to get h2o’s glm to match r’s glm because by default, they do not function the same way. to match r’s glm, you must set the following in h2o’s glm:

solver = "IRLSM"
lambda = 0
remove_collinear_columns = TRUE
compute_p_values = TRUE

Note: beta_constraints must not be set.

GLM Algorithm

宁夏微乐滑水麻将下载following the definitive text by p. mccullagh and j.a. nelder (1989) on the generalization of linear models to non-linear distributions of the response variable y, h2o fits glm models based on the maximum likelihood estimation via iteratively reweighed least squares.

Let \(y_{1},…,y_{n}\) be n observations of the independent, random response variable \(Y_{i}\).

assume that the observations are distributed according to a function from the exponential family and have a probability density function of the form:

\(f(y_{i})=exp[\frac{y_{i}\theta_{i} - b(\theta_{i})}{a_{i}(\phi)} + c(y_{i}; \phi)]\) where \(\theta\) and \(\phi\) are location and scale parameters, and \(a_{i}(\phi)\), \(b_{i}(\theta{i})\), and \(c_{i}(y_{i}; \phi)\)宁夏微乐滑水麻将下载 are known functions.

\(a_{i}\) is of the form \(a_{i}= \frac{\phi}{p_{i}}\) where \(p_{i}\)宁夏微乐滑水麻将下载 is a known prior weight.

When \(Y\)宁夏微乐滑水麻将下载 has a pdf from the exponential family:

\(E(Y_{i})=\mu_{i}=b^{\prime} var(Y_{i})=\sigma_{i}^2=b^{\prime\prime}(\theta_{i})a_{i}(\phi)\)

Let \(g(\mu_{i})=\eta_{i}\) be a monotonic, differentiable transformation of the expected value of \(y_{i}\). The function \(\eta_{i}\)宁夏微乐滑水麻将下载 is the link function and follows a linear model.

\(g(\mu_{i})=\eta_{i}=\mathbf{x_{i}^{\prime}}\beta\)

When inverted: \(\mu=g^{-1}(\mathbf{x_{i}^{\prime}}\beta)\)

Maximum Likelihood Estimation

For an initial rough estimate of the parameters \(\hat{\beta}\), use the estimate to generate fitted values: \(\mu_{i}=g^{-1}(\hat{\eta_{i}})\)

Let \(z\) be a working dependent variable such that \(z_{i}=\hat{\eta_{i}}+(y_{i}-\hat{\mu_{i}})\frac{d\eta_{i}}{d\mu_{i}}\),

where \(\frac{d\eta_{i}}{d\mu_{i}}\) is the derivative of the link function evaluated at the trial estimate.

Calculate the iterative weights: \(w_{i}=\frac{p_{i}}{[b^{\prime\prime}(\theta_{i})\frac{d\eta_{i}}{d\mu_{i}}^{2}]}\)

where \(b^{\prime\prime}\) is the second derivative of \(b(\theta_{i})\) evaluated at the trial estimate.

Assume \(a_{i}(\phi)\) is of the form \(\frac{\phi}{p_{i}}\). The weight \(w_{i}\) is inversely proportional to the variance of the working dependent variable \(z_{i}\) for current parameter estimates and proportionality factor \(\phi\).

Regress \(z_{i}\) on the predictors \(x_{i}\) using the weights \(w_{i}\) to obtain new estimates of \(\beta\).

\(\hat{\beta}=(\mathbf{X}^{\prime}\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{W}\mathbf{z}\)

where \(\mathbf{X}\) is the model matrix, \(\mathbf{W}\) is a diagonal matrix of \(w_{i}\), and \(\mathbf{z}\) is a vector of the working response variable \(z_{i}\).

This process is repeated until the estimates \(\hat{\beta}\) change by less than the specified amount.

Cost of computation

h2o can process large data sets because it relies on parallel processes. large data sets are divided into smaller data sets and processed simultaneously and the results are communicated between computers as needed throughout the process.

宁夏微乐滑水麻将下载in glm, data are split by rows but not by columns, because the predicted y values depend on information in each of the predictor variable vectors. if o is a complexity function, n is the number of observations (or rows), and p is the number of predictors (or columns) then

\(Runtime \propto p^3+\frac{(N*p^2)}{CPUs}\)

宁夏微乐滑水麻将下载distribution reduces the time it takes an algorithm to process because it decreases n.

relative to p, the larger that (n/cpus) becomes, the more trivial p becomes to the overall computational cost. however, when p is greater than (n/cpus), o is dominated by p.

  \(Complexity = O(p^3 + N*p^2)\)

For more information about how GLM works, refer to the Generalized Linear Modeling booklet.

References

breslow, n e. “generalized linear models: checking assumptions and strengthening conclusions.” statistica applicata 8 (1996): 23-41.

Frome, E L. “The Analysis of Rates Using Poisson Regression Models.” Biometrics (1983): 665-674.

Goldberger, Arthur S. “Best Linear Unbiased Prediction in the Generalized Linear Regression Model.” Journal of the American Statistical Association 57.298 (1962): 369-375.

Guisan, Antoine, Thomas C Edwards Jr, and Trevor Hastie. “Generalized Linear and Generalized Additive Models in Studies of Species Distributions: Setting the Scene.” Ecological modeling 157.2 (2002): 89-100.

Nelder, John A, and Robert WM Wedderburn. “Generalized Linear Models.” Journal of the Royal Statistical Society. Series A (General) (1972): 370-384.

lee, y and nelder, j. a. hierarchical generalized linear models with discussion. j. r. statist.soc. b, 58:619-678, 1996.

lee, y and nelder, j. a. and y. pawitan. generalized linear models with random effects. chapman & hall/crc, 2006.

Pearce, Jennie, and Simon Ferrier. “Evaluating the Predictive Performance of Habitat Models Developed Using Logistic Regression.” Ecological modeling 133.3 (2000): 225-245.

Press, S James, and Sandra Wilson. “Choosing Between Logistic Regression and Discriminant Analysis.” Journal of the American Statistical Association 73.364 (April, 2012): 699–705.

宁夏微乐滑水麻将下载snee, ronald d. “validation of regression models: methods and examples.” technometrics 19.4 (1977): 415-428.

.