Empirical framework

Context

Tanzania is one of the largest countries in Eastern and Central Africa, and an important source of the region’s maize production. However, most of this production comes from smallholders who have relatively low levels of productivity, and few of which use modern inputs such as fertilizer. As such, raising maize yields has been an important investment and policy target for the country and its partners in recent years. Tanzania is representative in many ways of the maize-based farming systems found elsewhere in the region, in terms of its agroecologies and range of biophysical endowments, the predominant production characteristics of its smallholder farmers, and the relatively low levels of market infrastructure development. At the same time, the heterogeneity of production characteristics found within Tanzania’s maize growing areas bodes well for its value as a test case for evaluating variability of agronomic responses across key geographical characteristics (Nord & Snapp 2020).

Data

Farm household survey data were collected in Tanzania in 2016 and 2017 on 624 households, located in 25 districts (Figure 1). These districts are located in both the Southern Highlands and Northern zone, representing the most important maize growing areas in the country. Within each district, a stratified sampling frame was used that maximized soil type variability so as be able to make broad inferences about crop response, and to identify survey localities (Walsh & Vågen 2006; Shepherd et al., 2015). Within each locality, a listing of all maize producing households was generated with the assistance of the local headman. From this listing, 24 households in each locality were randomly selected. Data were collected on household demographics, farm and non-farm economic portfolios, land holdings and productive assets, and other characteristics. Within each farm household, basic information was collected for each plot managed by the household (e.g. land use status, production decisions). In addition, very detailed agronomic management information was collected for household’s most important maize plot (henceforth the farm’s “focal plot”). This plot was identified by the farmer as the plot which generated the most maize production, and which received the most managerial effort.
Nitrogen and other macronutrient supplies were calculated from the various fertilizer blends farmers reported using. To account for implausible values, we replaced application rates exceeding 700 kg ha-1 of N with that value, which was tantamount to winsorizing at the 99th percentile of N application rates for fertilizer users, and which follows the protocols used by Liverpool-Tasie et al. (2017) and Sheahan and Barrett (2017).
Maize yields on focal plots were measured using crop cuts from three 5x5 meter quadrants, calculated at 12.5% grain moisture content. Soil characteristics from these plots were measured from samples taken at quadrant locations at 0-10 and 10-20 cm depths.
Total organic carbon, despite its well-recognized importance as an indicator of overall soil quality, is not an ideal indicator of nutrient availability because much of the bulk soil organic matter is relatively inert (Drinkwater et al., 1998). Soil organic carbon is largely conditioned by topography and soil parent material; however, once a field is converted to agriculture, active soil organic matter fractions largely determine soil productivity, and this is markedly influenced by farmer practices (Zingore et al., 2008). Thus, rather than testing for total carbon, as is often the case in standardized soil testing, testing the active organic matter pool provides better insight into how changes in management affect nutrient cycling and potential soil C accumulation or loss (Haynes, 2005; Wander, 2004). The active carbon pool, while constituting a small fraction (5–20%) of the soil’s total organic matter, is the component that greatly influences key soil functions, such as nutrient cycling and availability, soil aggregation, and soil C accumulation (Grandy and Robertson, 2007; Schmidt et al., 2011; Six et al., 1998; Wander, 2004). Hence, in this analysis, we focus on the factors influencing active carbon.
Developments in laboratory assays to monitor ‘active’ soil organic matter fractions have highlighted the value of permanganate oxidizable carbon as an early indicator of management influence on soil organic carbon (Culman et al., 2012). Total soil organic carbon also provides insights regarding sustainable soil management, although at a slow timestep (five to ten years). For this work, permanganate oxidizable carbon (POXC) was determined on a ground (1mm sieve) sub-sample, oxidized with 0.02 M KMnO4, and subsequently absorbance was read at a wavelength of 550nm (ibid.). To address potential measurement error, and under the assumption that the soil properties of interest here (particularly soil active carbon) are relatively stable, we use the average measure across the two years for each plot in our regression work.
Rainfall was measured as the sum of dekadal values recorded for the main growing season, using the CHIRPS dataset (Funk et al., 2017). Rainfall variability was measured as the coefficient of variation on the dekadal observations within a season.

Estimation strategy

The intent of this paper is to understand the agronomic and economic returns to nitrogen fertilizer applications in smallholder maize production. In keeping with agronomic and agricultural economic literature, we frame maize yield (y) as a function of fertilizer application rates (F), other agronomic management decisions (M), and other exogenous conditioners (G).
\(y\ =\ f(\mathbf{F},\mathbf{M},\mathbf{G})\) (1)
Because farmers in Tanzania use a variety of fertilizer blends, we integrate these decisions be decomposing each blend into its macronutrient content, i.e. nitrogen (N), phosphorous (P) and potassium (K). Other management factors include improved maize seed, maize-legume intercropping (common in the southern highlands), organic matter integration via compost, manure and crop residue retention, plant spacing, weeding, fallowing, terracing and erosion control structures, and herbicide and pesticide applications. Other exogenous conditioners include slope, rainfall, rainfall variability and the presence of disease or striga (witchweed).
We adopt a flexible polynomial functional form, allowing for quadratic terms and interactions between variables. In this approach, we follow similar empirical studies (e.g. Burke et al., 2017, Sheahan et al., 2013, Xu et al., 2009). This flexibility is important in enabling us to investigate how yield response to nitrogen is conditioned by other factors. We may generalize this function as:
\(y_{\text{it}}\ =\ \alpha+\beta_{1}N_{\text{it}}+\beta_{2}N_{\text{it}}^{2}+\beta_{10}\mathbf{X}_{\mathbf{\text{it}}}++\beta_{11}N_{\text{it}}*\mathbf{X}_{\mathbf{\text{it}}}+\ u_{\text{it}}\)(2)
where N is nitrogen, our primary input of interest, i indexes plots, t indexes observations over time, and where, for convenience, we have subsumed M and G in the vectorX . As indicated earlier, a priori hypotheses include the possibility of positive interactions between nitrogen, soil organic carbon and rainfall, after controlling for other factors.
A key consideration is the possibility that unobserved factors may possibly bias our estimation results. Concretely, we may decompose the residual in equation 2 as:
\(u_{\text{it}}\ =\ o_{\text{it}}+\ c_{i}+\ \epsilon_{\text{it}}\)(3)
where \(o\) represents unobserved time-varying factors, \(c\) represents unobserved time-constant factors, and \(\epsilon\) is a randomly distributed error term. Time-varying unobservables may include soil moisture, nutrient status or other factors which are often missing from empirical studies (or poorly measured). Time-constant unobservables may include farmer ability or plot biophysical characteristics which change little from year to year, but which may affect both fertilizer usage and yield outcomes. Finally, correlation between model covariates and the stochastic error term may be an additional source of bias.11Burkeet al . (2017) provide a useful, detailed discussion of these issues and corresponding identification strategies in survey data settings.
In the present study, we argue that our dataset does a better job at controlling for time-varying plot and plot-management factors than is typically the case in empirical studies, and therefore unobseved\(o_{\text{it}}\)is unlikely to be a major issue. Our larger concern is with time-invariant unobserved farmer and plot-level heterogeneity which are likely to upwardly bias our results if not addressed (e.g. under the assumption that more able farmers are more likely to use fertilizer than less able farmers). To address this, we estimate models with the Mundlak-Chamberlain device (i.e. the Correlated Random Effects model (Wooldridge 2010), as well as a Fixed Effects estimator.