Home Archived December 11, 2018

The USGS Land Cover Institute (LCI)

Accuracy Assessment of 1992 National Land Cover Data

Methods and Results

Table of Contents


A consortium consisting of several federal agencies was formalized to produce a consistent and seamless National Land Cover Dataset (NLCD) for the conterminous United States (Loveland and Shaw, 1998). Land cover mapping has been conducted for each of ten geographic regions using early 1990s Landsat Thematic Mapper (TM) imagery augmented by a suite of other geospatial ancillary data sets. Briefly, the NLCD was compiled through unsupervised clustering (Kelly and White, 1993) of Landsat TM data. The resulting spectral clusters were resolved into one of 21 thematic classes using logical modeling and ancillary data sources (e.g., census, slope/aspect/elevation, etc.) as required. The twenty-one thematic classes resemble the well-established Anderson land use/cover classification system (Anderson et al. 1976). Details of the classification process are discussed in Vogelmann et al. (1998a, 1998b).

As land cover mapping of each region is completed, thematic accuracy is evaluated. Here we present a brief report on methods and results of the accuracy assessment to date (October, 2000) for four geographical regions in the eastern United States (New England, New York/New Jersey, mid-Atlantic, southeast, referred to as region 1, 2, 3, and 4, respectively). Accuracy assessment of the remaining regions is in progress. We anticipate completion of accuracy assessment for all regions by late 2001.

Accuracy Assessment Methods

The accuracy assessment of NLCD is achieved with

    1) a probability sampling design;
    2) a response design for reference data evaluation; and
    3) an analysis procedure for estimation of accuracy parameters.

The sampling design incorporated three layers of stratification and a two-stage cluster sampling protocol (Stehman et al., 2000). Each mapping region (New England, New York/New Jersey, mid-Atlantic, southeast) constituted a stratum and was sampled independently. Within each mapping region, geographic strata were created using 15' x 15' or 30' x 30' grid cells, depending on the size of the region. Primary sampling units (PSU) defined by non-overlapping, interior regions of aerial photographs of 1990 vintage acquired by the National Aerial Photograph Program (NAPP) were then delineated within these strata. These PSUs partition each region into nearly equal area units. A single PSU was then randomly selected from each grid cell, with all PSUs having an equal probability of being selected (Fig. 1). The pixels selected within the first-stage PSUs were then stratified by mapped land-cover class, and a simple random sample of pixels was selected independently for each land-cover class. Across the entire mapping region, approximately 80 to 100 samples were selected for each land cover type.

To obtain the reference classification, each sample (pixel) was located on a hard copy NAPP aerial photograph with a drape of the sample point over a Landsat three band composite image providing geo-reference data (see Fig. 2 for an example). A suite of reference information, in addition to a land cover label was collected by the photointerpreters, including primary and alternate land-cover label (an alternate reference label only provided when appropriate, e.g. low intensity residential as primary and urban grass as an alternate label), land-cover heterogeneity in the vicinity of the sample unit, and a confidence rating of the photointerpreted land-cover label (Table 1). It should be noted that the alternate label was not implemented in region 3 photo-interpretation and was rarely used in region 1. For a more detailed discussion on the reference data collection and evaluation, refer to Yang et al. (2000) and Zhu et al. (in press).

For each mapping region, stratified sampling formulas were applied to estimate the error matrix cell proportions (Stehman and Czaplewski, 1998), and consequently, the estimates of overall and class-specific user's and producer's accuracy (Story and Congalton, 1986). The use of stratified formulas is important because of sampling methods that have been chosen for the project. Accuracy results were computed through weighting the cell proportions by the proportion of each land cover within a given region. Specifically, the overall accuracy (Overall accuracy using poststratified formulas ) and producer's accuracy (Producer's accuracy using poststratified formulas ) are estimated using poststratified formulas (poststratified estimators use the known pixel totals for each land-cover class (Ni+), treating the sample as a stratified random sample of ni+ pixels from the Ni+ pixels in that class), whereas user's accuracy (User's accuracy) is based on the simple random sampling formula:

Random sampling formula

The producer's accuracy relates to the probability that a reference sample (photo-interpreted land cover class in this project) will be correctly mapped and measures the errors of omission (1 - producer's accuracy). In contrast, the user's accuracy indicates the probability that a sample from land cover map actually matches what it is from the reference data (photo-interpreted land cover class in this project) and measures the error of commission (1- use's accuracy).

Accuracy estimates using different definition of agreement protocol

Accuracy results are reported using several definitions of agreement between the map and primary or alternate reference land cover labels. A direct comparison at each pixel of the photo-interpreted land cover label with the corresponding map label (pixel-to-pixel comparison) is the most restrictive protocol for defining agreement. It reflects a 'conservative bias' (Verblya and Hammond, 1995) due to the confounding of true classification error with errors attributable to misregistration or inability to confidently photo-interpret a sample unit. The results of this comparison are also affected by temporal differences between Landsat TM data and NAPP photo acquisition. The second definition of agreement allows a match between the photo-interpreted label of a sample pixel and the most common class within a 3 by 3 pixel block centered on the sample pixel (mode comparison, see Figure 3 for an example and explanation on two different agreement protocols). This comparison takes into consideration that, for many applications, a certain level of spatial generalization from the original full resolution (30 meters) land cover data is appropriate. The accuracy estimates based on most common class within 3x3 pixel block should provide useful information for such applications. Yet another set of accuracy estimates are derived using a subset of the original samples, i.e., the sample pixel is located within a homogeneous area in which only one land cover type exists within the 3x3 pixel block. The estimates based on this comparison likely have an 'optimistic bias' (Hammond and Verblya, 1996) because of the restriction to areas where land cover is homogeneous and generally is easily identified.

Results and discussion:

Result Tables

* New regional Accuracy Assessment results for EPA Federal Regions 5-10. The detailed procedures used to obtain the accuracy assessment results found in Regions 5-10 will be available soon. For further information, contact lci@usgs.gov.

Tables 2 through 5 present the overall and land cover class-specific estimates for each of the four regions in the eastern US using three comparison methods discussed previously. Table 6 lists the most frequently confused land cover categories between mapped and photo-interpreted results. In most cases confusion occurs between related classes, i.e., among three urban land use classes, three forest classes, and between row crop and hay/pasture. This is also shown by a significant improvement in accuracy estimates when land cover data of each region are aggregate into approximately USGS Anderson level I classification system (see Table 7). It is noted however, a few land cover types are confused with many other classes. Transitional barren, defined as an area dynamically changing from one land cover to another because of land use activities, is an example. Another confusion exists between two barren classes and forest and grassland classes in the mid-Atlantic and southeast region.

Major factors that have contributed to disagreements between mapped land cover and reference land cover labels include:

    1) Landsat TM data quality and mapping error,
    2) time difference in source imagery and reference data acquisition (hay/pasture, row crop, wetland, transitional),
    3) definition related to land use (high intensity residential and urban built-up, and the two barren classes), and
    4) spatial uncertainty, such as geo-registration error.

An example of mapping error is the limited success in discriminating hay/pasture from row crops using leaf-off season (spring or fall) Landsat TM data. The data analyst assumes that there is a temporal window during which hay and pasture green up before most other annual or perennial vegetation. However, if leaf-off data acquisition is not temporally ideal (e.g., the greenness level of hay/pasture areas is low), it may result in misclassification between hay/pasture and other agricultural lands.

Another source of error is the discrepancy between satellite imagery and NAPP photograph acquisition time. Acquisition dates of the NAPP photographs range from the late 1980s to 1997, whereas the satellite data were mostly acquired from 1991 to 1993. Any changes that took place across the landscape over this time period complicate interpretation and comparison between reference and mapped land cover. One class that suffers most is the transitional barren, a class that is designed for conditions such as temporary clearing and regeneration of forest cover. Similar problems exist within agricultural classes due to crop rotations.

Low accuracy for classes related to land use is understandable. Despite the extensive use of ancillary data, such as the census data, it is very difficult to unambiguously separate high intensity residential from other urban uses, either during the mapping or photo-interpretation process. The same is true for the land use related differences between the quarry/strip mine class and the sandy/gravel class.


Accuracy assessments were conducted separately for each region, i.e., each region is regarded as a population for drawing samples. Therefore, all accuracy estimates reported here should be pertinent to the entire region only, not necessarily valid for any given sub-division within the region. However it is possible to use the existing sample units and/or additional samples to obtain sub-region accuracy estimates if desired using an appropriate procedure.

Because the reference data collected through photo-interpretation are conducted by different groups and contractors for each region, information collected (as listed in table 1) are not uniformly identical across all regions. For example, region 1 (New England) and region 3 (mid-Atlantic) photo-interpretation collect little or no information on alternate land cover label. Hence the estimates of accuracy parameters for the two regions should be regarded conservative due to inclusion of sample points with only primary land cover label. For more information on the deviation from the general procedures adopted by a specific region and possible effect on accuracy estimation, click here.


A National Land Cover Dataset (NLCD) for the conterminous U.S. has been compiled by the U.S. Geological Survey (USGS) as part of a cooperative project between the USGS and the U.S. Environmental Protection Agency (EPA). The data set is a nationally consistent land cover product derived from the early to mid-1990s Landsat Thematic Mapper satellite imagery. Accuracy assessments of the product have been conducted region-by-region using a scientifically rigorous approach. Results of the accuracy assessment indicate the ability of the NLCD to meet data requirements for applications at the regional to continental scale, which is the primary objective of the mapping project (Yang et al., in press). Currently, this data set is being utilized for a wide variety of large-area applications, including watershed management, environmental inventories, fire risk assessment, and land management.

The NLCD data users are encouraged to utilize the data in a spatially aggregated form (e.g. 3x3 or 5x5 pixels blocks) whenever possible. This alleviates the "salt and pepper" effect existing in the original full resolution product. Similarly, if generalized land cover classification scheme (e.g. Anderson level I) meets the application requirements, it is wise to aggregate the NLCD data accordingly. Users are cautioned who intend to apply the data to highly localized studies, such as over a small urban-suburban setting or a watershed of only tens of square miles. The land cover data quality of such a small geographic extent is unknown and the users should carefully examine the NLCD product in the local context to determine its utility.


Anderson, J.F., E.E. Hardy, J.T. Roach, and R.E. Witmer. 1976. A land use and land cover classification system for use with remote sensor data, U.S. Geological Survey Professional Paper 964, U.S. Geological Survey, Washington, DC, 28 pp.

Hammond, T.O. and D.L. Verbyla. 1996. Optimistic bias in classification accuracy assessment. International Journal of Remote Sensing, 17: 1261-1266.

Kelly, P.M. and J.M. White. 1993. Preprocessing remotely sensed data for efficient analysis and classification. Applications of Artificial Intelligence 1993: Knowledge-Based Systems in Aerospace and Industry, Proceedings of SPIE, pp. 24-30.

Loveland, T.R., and D.M. Shaw. 1996. Multiresolution land characterization: building collaborative partnerships, Gap Analysis: A Landscape Approach to Biodiversity Planning (J.M. Scott, T. Tear, and F. Davis, editors), Proceedings of the ASPRS/GAP Symposium, Charlotte, North Carolina, National Biological Service, Moscow, Idaho, pp 83-89.

Smith, J.H., J.D. Wickham, S.V. Stehman and L. Yang. 2002. Impacts of Patch Size and Land Cover Heterogeneity on Thematic Image Classification Accuracy. Photogrammetric Engineering and Remote Sensing, 68:65-70.

Smith, J.H., S.V. Stehman, J.D. Wickham, and L. Yang. 2003. Effects of landscape characteristics on land-cover class accuracy. Remote Sensing of Environment 84:342-349.

Stehman, S.V. and R.L. Czaplewski. 1998. Design and analysis for thematic map accuracy assessment: Fundamental principles. Remote Sensing of Environment, 64:331-344.

Stehman, S.V., J.D. Wickham, L. Yang, and J.H. Smith. 2000. Assessing the accuracy of large-area land cover maps: Experiences from the Multi-resolution Land-cover Characteristics (MRLC) Project. Proceedings of the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Delft University Press, The Netherlands, 601-608.

Stehman, S.V., J.D. Wickham, J.H. Smith, and L. Yang. 2003. Thematic accuracy of the 1992 National Land-Cover Data (NLCD) for the eastern United States: statistical methodology and regional results. Remote Sensing of Environment, 86:500-516.

Story, M. and R.G. Congalton. 1986. Accuracy assessment: A user's perspective. Photogrammetric Engineering and Remote Sensing 52(3): 397-399.

Verbyla, D.L. and T.O. Hammond. 1995. Conservative bias in classification accuracy assessment due to pixel-by-pixel of classified images with reference grids. International Journal of Remote Sensing, 16: 581-587.

Vogelmann, J.E., T. Sohl, P.V. Campbell, and D.M. Shaw. 1998a. Regional land cover characterization using Landsat Thematic Mapper data and ancillary data sources. Environmental Monitoring and Assessment 51: 415-428.

Vogelmann, J.E., T. Sohl, and S.M. Howard. 1998b. Regional characterization of land cover using multiple sources of data. Photogrammetric Engineering and Remote Sensing 64(1): 45-57.

Wickham, J.D., S.V. Stehman, J.H. Smith, T.G. Wade, and L. Yang. 2004. A priori evaluation of two-stage cluster sampling for accuracy of large-scale land-cover maps. International Journal of Remote Sensing, 1235-1252.

Wickham, J.D., S.V. Stehman, J.H. Smith, and L. Yang. accepted (pending revision). Thematic accuracy of MRLC-NLCD land cover for the western United States. Remote Sensing of Environment.

Yang, L., S.V. Stehman, J.H. Smith, and J.D. Wickham. 2001. Short Communication: Thematic accuracy of MRLC land-cover for the eastern United States. Remote Sensing of Environment, 76:418-422.

Yang, L., S.V. Stehman, J.D. Wickham, J.H. Smith, and N.J. Van Driel. 2000. Thematic validation of land cover data of the eastern United States using aerial photography: Feasibility and challenges. Proceedings of the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Delft University Press, The Netherlands, 747-754.

Zhu, Z, L. Yang, S.V. Stehman, and R.L. Czaplewski. in press. Accuracy assessment for the U.S. Geological Survey regional land cover mapping program: New York and New Jersey Region. Photogrammetric Engineering and Remote Sensing.