Like most other mapping agencies, the New Mexico Bureau of Geology and Mineral Resources (NMBGMR) has produced geologic maps for many years using a Geographic Information System (GIS). A GIS is essentially a geospatial database that stores information about the shape and position of mapped features as well as associated data. In order for a particular GIS-based map to be interoperable with other maps, their geospatial databases must be organized with a consistent structure. A data model is a standardized database structure (also called a database schema) that defines what features (or entities) are recorded, what their attributes are (often with a pre-defined set of possible values), and how they relate to one another.
The NMBGMR began investigating the implementation of a new geologic data model when the limitations of our existing rudimentary model became apparent. Our early model was based on an early USGS schema and was very basic. It was sufficient to attribute simple orientation data, contacts and faults (solid, dashed, dotted and queried), and polygons (attributed with a map unit abbreviation). This allowed us, in most cases, to create a paper plot (or PDF) of a useable geologic map. However, this system did not allow us to store detailed data regarding geologic features or to adequately show geologic relationships between them. For instance, planar fault orientation data was recorded as a separate record from linear slickenline data on the same fault and the topological relationship between them was lost. It was also not possible to record kinematic information regarding fault movement in the old model. Because of these and many other similar problems, the ability to analyze these geologic map spatial databases was somewhat limited and topological relationships between geologic features were lost or were hard to recreate when needed. We also had problems when attempting to compile geologic maps of larger areas because each quadrangle often had a 'custom' (i.e. seat-of-the-pants) data model applied as particular data types were encountered. This was an inconvenience when the number of GIS-based geologic maps was relatively small. However, as the number of GIS map projects multiplied, it became clear that we needed a more sophisticated geologic data model, both for consistency between maps, and to allow our geologists to record a wider variety of geologic information. It also became apparent that we needed a way to store feature level metadata, particularly the source ID for features compiled from other maps.
When we decided we needed a better data model, we first looked at existing geologic data models. At the time we found that existing models were either too complex to be practical or otherwise don't fit our needs. So, we chose to create our own model from scratch, borrowing useful ideas from other models. Since both field-mapping and digitization of maps are already fairly labor-intensive, we didn't want to add needless complexity to the process of producing maps. However, we did want have the ability to create a fully attributed geologic map in a GIS. The links in the banner above describe the feature classes and attributes of our model. An XML schema of our model is available for download.
The NMBGMR Geologic Map Data Model is expected to evolve as various portions of it are implemented.
v. 1.0.4
Before we began creating our model, we looked at geologic data models developed by other organizations. At the time, we looked closely at the North American Geologic Map Data Model and its variants and an early ESRI model. The complexity of these models, coupled with the fact that they had not been significantly implemented, suggested that they were unsuitable for our needs.
In response to difficulties encountered with the NADM, the USGS recently proposed a new draft standard: NCGMP09 for individual geologic map quadrangles. The NCGMP09 is much less complicated than the NADM, and is quite similar to the model that we developed in parallel. In some ways, our model is much more granular because we have separate feature classes for separate types of features (see comparison below).
As noted above, our model developed in tandem with both the NCGMP09 and ESRI models and shares several design features with them -- but also has some important differences:
We have far more granular feature classes in our model than the NCGMP09 model does and a different structure than the ESRI Geologic Mapping Template. The benefit of separate feature classes is that it is easier to create maps that display just the features of interest. For instance, if a structure map is needed, you can just display the faults, folds, and perhaps structure contour layers. To do this in the NCGMP09 model would require querying the data and perhaps exporting features to new feature classes to construct these derivative maps. Another benefit of feature classes dedicated to a particular type of feature is that attributes can be more specific for that feature type. Of course, the drawback of our approach is that having more feature classes can make geodatabases more complicated.
Traditionally, lines (generally contacts and faults) on printed geologic maps are either solid, dashed, dotted or queried. Solid lines were used to represent linear features that were confidently identified, well located, and exposed. Dotted lines were used for concealed features that a geologist felt reasonably confident in projecting beneath another unit. Queries along lines reflected decreased confidence about both existence and location. Dashed lines were more mysterious. Dashed lines could represent decreased confidence because a contact was mapped with binoculars or air photos, was poorly exposed, wasn't well located in areas of low relief, or was interpolated. The main problem with the standard line types used on paper geologic maps is that there are multiple inter-related attributes than can be effectively symbolized with such a simple system of line types.
We chose to attribute linear features using a combination of two attributes, 1) Exposure (exposed, obscured or intermittent, concealed) and 2) Scientific Confidence combining confidence regarding the existence and/or identification of a feature (certain, probable, uncertain). Note that positional (or locational) accuracy is not recorded for lines and does not affect our symbology (positional accuracy can be recorded for points along the line however). There are several reasons for not attributing positional accuracy for linear features. First of all, it is much simpler not to because it quickly becomes very difficult to create a workable field symbology for use on paper field maps. Also, if a line were attibuted as having an 'accurate' location, the assumption on the part of map users is that the line has actually been surveyed -- the way a road or pipeline might be. This is very rarely the case for lines on geologic maps. Positional accuracy usually varies continuously along the linear features. This variability can better be expressed by recording this data at the specific points where the positional accuracy was actually measured.
We have defined a number of important topology rules that should be valid for any geologic map. Most of these rules are obvious: no gaps between polygons, contacts must overlie polygon boundaries, contacts can't dangle etc. These rules help identify and fix inevitable digitization errors. Other rules require that fold and fault measurements should line on their respective line types or be marked as exceptions. These exceptions will additionally have an attribute "MappedFeature" set to FALSE so that they can easily be symbolized as minor structures.
A more fundamental topologic relationship exists for point data that can have measurements for both planar and linear data, like faults with slickenlines, fold axial planes and fold axes, foliations with extension lineations, or bedding with paleocurrent vectors. For all these types of features, planar and linear data resides in the same record. Of course, there are many ways to store such a relationship in a database, but this method is by far the simplest to see and understand when editing or viewing the database. Many other geologic data models store one point for a fault plane and another for the slickenline in that plane. It then becomes very difficult to extract this key data from what is really a single data point.
Our line feature classes are structured somewhat differently than other data models. Lithologic contacts are separated into two feature classes: Lith_Contacts and Concealed_Contacts. Additionally, faults are stored and fully attributed in Fault_line rather than being combined with contacts as in the NCGMP09 model. After faults are attributed, non-concealed faults (that participate in polygon topology) are copied to the Lith_Contacts layer where they will retain their LineClass attribute of 'fault'. Before building polygons, the Map_Boundary polyline is also copied to the Lith_Contacts layer and the topology is validated. Faults that dangle are deleted from the Lith_Contacts layer and any other topology problems are fixed. When there are no topology errors -- and no exceptions -- polygons can be built from the Lith_Contacts layer (and attributed using Lith_poly_label points if present).
We chose to proceed from very general lithologic attributes to more specific attributes:
The most specific lithologic attributes are divided into PrimaryLithology and SecondaryLithology which could either use uncontrolled terms or use the NGMDB vocabularies.
In addition to a long Text field for UnitDescription, we also include a ShortDescription field suitable for the map legend.
A geologic events table as specified in the NCGMP09 model is not currently part of our model. However, features like faults have attributes for Ancestry and LastActive. Our Lithology table has attributes for min/max/preferred age, GeneticEnvironment and DepositionalSystem. Having all geologic features linked to attributes about their geologic history sounds like it could be very useful, but that may be extremely difficult to implement.
Almost all attributes in our model are human-readable text. This is less space efficient than using numeric codes and precludes the use of ArcGIS subtypes, but it makes the database much easier to comprehend, especially if it has been exported to a shapefile. Many of these have coded value domains that provide pick lists for acceptable attributes. Aliases show the text used as the code within brackets to facilitate editing when more than one feature is attributed with the ArcMap field calculator. For instance, “obscured” is a code that has the alias: “[obscured]: Intermittantly exposed / obscured by colluvium”.
Some feature classes like DataPoint are just containers for the location of point data that can be attributed in separate tables as needed. In general, however, most feature classes have a fairly comprehensive set of attributes. These could be expanded as the need arises. Another approach is to use extended attributes as used by the NCGMP09 and ESRI Geologic Mapping Template. This allows for uncommonly used attributes to be stored in a separate related table. In these models, one table is used for extended attributes for all feature classes by relying on user-maintained keys specifying the parent feature and the parent feature class. This approach seems difficult to manage if a large number of extended attributes are used. Perhaps feature classes could have rarely used and extended attributes in a dedicated table and use one-to-one relationship classes to maintain the link between features and attributes. For instance, Fabric_point could store rarely used attributes and extended attributes in a table called Fabric_pt_attr. This wouldn't rely on user-maintained database keys and would allow for automatic deletion of attribute data when the parent feature is deleted. The downside of this approach would be more tables in the geodatabase.
We don't currently encode the correlation of map units diagram within our geodatabases. The flexibility of producing custom diagrams in a graphics program is somewhat offset by the utility of standardization as advocated by the NCGMP09. The encoding scheme in the NCGMP09 seems rather complicated for practical use however.
We have set up geodatabase relationship classes between features and stand-alone tables. For instance Lith_poly units are in a relationship class with the Lithology table. Relationship classes have the advantage over standard database joins or relates in that the relationship is stored in the geodatabase itself and not in the ArcMap MXD file.
We include feature classes for creating an bedrock map beneath alluvium/colluvium or other cover. These derivative maps are very useful for hydrologic modeling, basin analysis, and other geotechnical projects.
When we began constructing our model, cartographic representations were not available in ArcMap and symbology was (and still is) limited to combinations of three attributes at a time. Our feature classes were designed with this in mind. Many feature classes had somewhat generic attributes based on a Class, Type, SubType attribute hierarchy. This has evolved somewhat over time, but we have tried wherever possible to limit the number of attributes that must be considered to define symbology to three.
Cartographic representations are another approach to symbology but require that all symbology be abstracted from a code. They also require orientation of symbols to be attributed opposite to azimuth conventions on geologic maps. One problem with the FDGC standard and ESRI approach to using cartographic representations of it is that a number of the symbol codes refer to multiple features that should be symbolized separately. For instance:
2.11.13 | Lineation on inclined fault surface—Tick shows fault dip value and direction; arrow shows bearing and plunge of lineation |
While the fault plane and lineation (slickenline) are both fundamentally part of one data point measurement (see above), They need to be symbolized with two instances of the data. Of course, there are individual FDGC codes for each of these elements, but it might be useful to eliminate the FGDC codes that aren't granular enough to apply to individual features and data types. Another problem with the code approach is that it would be very easy for the code to not be synchronized with the actual attributes of the feature which would become very confusing to end-users. One way to get around this problem would be to have separate joined tables that allow determination of symbol codes based on attributes. This has the added benefit of separating style from content the same way that standards-compliant HTML encodes content while CSS applies styles for Web pages.
An extensive revision of how our model handles well locations and data related to wells was made at version 1.0.3. It would be possible to use just the well-related part of our datamodel for those who already have a schema that suits their needs for other geologic data. The well schema became complex enough that we decided to detach it from the rest of the geologic data model. Please see the discusion on the NM Wells Data Model page for more information.
Eventually, some consensus will probably be reached and a single geology data model will be widely adopted and be interoperable with GeoSciML. This will make it much easier for anybody who tries to produce compilations, create derivative maps, or to perform spatial analyses of existing geologic map data. While our model has been working reasonably well for us, we don't presume that it will be the model adopted. Nonetheless, we do hope that some of the ideas presented by this model will be borrowed by other models -- just as we have done.