Case Study

Validation of Very Large Polygons

Examples of Very Large Polygons

Mapping wetland areas can result in large, complex polygons. An example of a such a polygon is shown below. It is the representation of a swamp in Northeastern British Columbia, as captured by the TRIM project. The complexity of this polygon is indicated by the following statistics:

Total # points # points in shell # holes
598,093 150,781 6,937

The WKT file for this polygon is over 20 Meg in size!


Figure 1. A very large wetland polygon (Polygon E)

This image is taken from a popular GIS application. The application loaded and displayed the polygon without displaying any error messages, even though (as we discovered) this polygon in fact contains a topology error.

Using JTS to Validate Large Polygons

JTS provides the isValid() function to validate the correctness of Geometries. In the case of validating polygons, several potentially expensive tests are performed to check for line segment intersections and to test that holes are inside the polygon shell. If done using naive algorithms these tests are O(n2 ), which results in extremely long processing times for large inputs. JTS uses spatial indexing techniques to greatly increase the speed of the intersection and point-in-polygon algorithms.

The Java code to read the WKT datafiles and perform the validation with JTS is quite simple:

    FileReader fRdr = new FileReader("largeIntersectPoly.wkt.txt");
    Geometry geom = wktRdr.read(fRdr);
    fRdr.close();
    IsValidOp validOp = new IsValidOp(geom);
    TopologyValidationError err = validOp.getValidationError();
    if (err != null) System.out.println(err);

Listing 1. Java code to read a geometry and perform validation

We used a series of large polygons to test the performance of the JTS isValid() function. Statistics on the polygons used and their processing times are shown below:

Image Name Total # points # points in shell # holes Time to read WKT Time to run validation
A 10,936 7,158 30 5.136 s .1643 s
B 30,498 11,886 332 15.33 s .601 s
C 94,150 40,320 789 47.533 s 1.962 s
D 163,122 31,987 3,276 x s 4.406 s
E 598,093 150,781 6,937 311.11 s 14.689 s
Note 1: Testing performed on a Pentium 1.1 GHz machine running Java 1.3 under Windows 2000.
Note 2: Time for Polygon E is less than expected, due to a topology error being detected.

One immediate observation is WKT is not the ideal format for storing large amounts of spatial data! The validation processing times themselves are very respectable.

Finding & Displaying Topology Errors

When JTS attempted to validate Polygon E it discovered a self-intersection! The output of the validation program is shown below.

Self-intersection at or near point (1228528.7382695903, 1589070.7373000782, NaN)

Figure 2. The error reported by the validation program

Because this polygon is so large, it is not possible to load it directly into the JTS TestBuilder. However, by cutting out the portion containing the error into an extract file it is possible to use the TestBuilder to validate the extracted polygon and display the location of the self-intersection.

   
Figure 3. A sequence of screenshots from the JTS TestBuilder displaying the self-intersection

Downloadable Files