A Multilevel-Weighted Mixed Effects Logistic Regression Model Approach to Assess Cluster Variation Impact on Under-Five Malaria Risk in Uganda

Data source and study population

This study made use of secondary data based on a two-stage cluster and stratified sampling technique from the Uganda Malaria Indicator Survey (UMIS) of 2018/19. The first stage of sampling involved selecting sample points (clusters) from the sampling frames. A total of 320 clusters were selected with probability proportional to size from the enumeration areas (EAs) covered in the 2014 National Population and Housing Census (NPHC). The second stage of sampling involved systematic selection of households. Twenty-eight households were selected from each EA, for a total sample size of 8,878 households. The primary objective of the 2018–19 UMIS is to provide up-to-date estimates of basic demographic and health indicators related to malaria. Specifically, the 2018/19 UMIS collected information on vector control interventions such as mosquito nets and indoor residual spraying of insecticides, on intermittent preventive treatment of malaria in pregnant women, on care-seeking and treatment of fever in children, and malaria knowledge, behaviour, and practices. All women age 15–49 who were either permanent residents of the selected households or visitors who stayed in the household the night before the survey were eligible to be interviewed. After a parent’s or guardian’s consent was obtained, children age 0–59 months were tested for anaemia and malaria infection. The study population consisted of 7,632 children less than 5 years of age who were tested for anaemia and malaria infection by a team of two health technicians, respectively [16]. The selection of the final study sample is as shown in Fig. 1.

Fig. 1
figure 1

Flow chart showing selection of the study participants

Analysis model

The dataset was first explored for preparation purposes. Before any analysis was conducted, the data were sorted, some variables recoded while other variables and some observations that were not of interest to the research problem were eliminated. Categorical variables were represented as counts and percentages. Collinearity was assessed among independent variables using a correlation matrix. Variables with correlation coefficient of 0.4 and above were not included in the same model. The survey design estimation command (svy) in Stata 15.0 (StataCorp, College Station, TX) was used to conduct descriptive analysis, accounting for the level weights. The level of statistical significance was p < 0.05 for all analyses. Overall, four multivariable models were considered; the first model neither adjusted for weighting nor cluster variation in the risk of under-five malaria; the second model only adjusted for cluster variation; the third model only adjusted for weighting; and the forth model adjusted for both weighting and cluster variation. A model was, therefore, considered to best fit the data if it had lower design factor (deft) values in general. Lower deft values are associated with lower loss of precision of model estimates [17]. The design factor (deft) was calculated as follows:

Check new:   Mexico TV deal signed by International Olympic Committee for Paris 2024


where; \(def\) is the design effect. \(rho\) is the intra-class correlation for the variable in question. \(n\) is the size of the cluster.

To assess the association between malaria infection in under-five children and individual, household, and enumeration area factors, a multilevel-weighted mixed effects logistic regression model (chosen among the four compared models as the best model) was specified to account for contextual within-household and within-EA correlations [18,19,20]. The model is represented as below:

$$ln\left(\frac{{p}_{ijk}}{1-{p}_{ijk}}\right)={\beta }_{0}+ {\beta }_{1}{X}_{ijk}+ {\eta }_{k}+ {\xi }_{jk}$$

where; \(ln\) is the natural logarithm. \({p}_{ijk}\) is the probability of testing positive for malaria for the ith under 5-year-old child in household \(j\) and EA \(k\).

\({\beta }_{0}\) is the mean log-odds of malaria across household and EA.

\({X}_{ijk}\) is a level 1 covariate for the ith child in household \(j\) and EA \(k\).

\({\beta }_{1}\) represents the slope associated with \({X}_{ijk}\) which represents the relationship between the level 1 covariates and the log-odds of malaria.

\({\eta }_{k}\) is the random effect for EA \(k\).

\({\xi }_{jk}\) is the household random effect.

Bivariate multilevel weighted-mixed effects logistic regression was conducted, using each of the individual, household, and community level risk factors as predictors and malaria test result as the outcome. Individual predictors with p < 0.20 were considered for inclusion in the multivariable multilevel logistic regression models. The multivariable analysis was conducted in a sequential process resulting into several models. Model 0 (the null model) was fitted to decompose the total variance of malaria risk between the cluster and level-1 covariates. It only included the random intercept to assess EA and household contribution to malaria risk before adding explanatory variables. The null model established the degree of variance at the cluster level in order to validate the use of multilevel modeling. Model 1 contains individual (level-1) variables; model 2 has household (level-2) variables in addition to variables in model 1; model 3 includes EA (level-3) variables in addition to variables in model 2. Model 3 was selected as the final model that was used to identify factors associated with malaria risk in under-five children since it was the most complete among the three models. To measure the extent to which individuals within the same group are more similar to each other than they are to individuals in different groups, intra-class correlation coefficient (ICC) was used [21]. A higher proportion of the ICC was linked to a higher general contextual effect [22]. The formula for the ICC is presented as below:

Check new:   EA FC 24 Launch: UK Release Date and Start Time

$$ICC\; = \;\frac{V_A }{{V_A \; + \;{{\pi^2 } / 3}}}$$

where \({V}_{A}\) is the cluster or area level variance and \(\pi^2 /3\) is a scalar that corresponds to the individual level variance. When the contribution to the overall ICC of a level(s) was very low (< 10%) its effect was considered insignificant and hence, the random effects component(s) at the specific level(s) was considered insignificant.


Since the sample for this study is a two-stage stratified cluster sample, level weights were calculated separately, based on sampling probabilities for each sampling stage and cluster. In this study, level weights were estimated using a framework for approximating level weights in Malaria Indicator Surveys (MIS) proposed by the Demographic Health Survey program [11]. The framework required data that is included in the publicly available UMIS datasets and final report.