This project has moved and is read-only. For the latest updates, please go here.

Database Schema and Object Model

Nov 25, 2009 at 4:18 PM

Based on the recent code review of HydroDesktop, David Valentine has proposed an object-oriented model. This model is more generic than the current HydroDesktop schema, WaterML or ODM. Main classes in the model are Series, Site, Variable and DataValue. Instead of accessing the database or WaterML files directly, HydroDesktop plug-ins will be able to work with the generic object model. The database abstraction layer will handle conversion between WaterML 1.0, WaterML 1.1, hydrodesktop main database, hydrodesktop metadata cache database and model objects.
If the database schema or WaterML changes in the future, only the database abstraction layer will need to be modified.

However, the object oriented model designed by David Valentine and the series-centric HydroDesktop database schema designed by Jeff Horsburgh are not quite compatible. Major differences are:

  1. Qualifiers: In HydroDesktop a data value can only have one qualifier. In the object model a data value can have multiple qualifiers
  2. Time units: In HydroDesktop time units are specified at the variable level. In the object model time units are specified at the series level.
  3. Site code: In HydroDesktop a site can only have one site code. In the object model a site can have multiple site codes.
  4. 'Network': In HydroDesktop the 'Network Prefix' is a part of the site code.  In the object model a site belongs to a 'Network' and 'Network' is a property of the site. Also it is not clear whether 'Network' is synonym for 'WaterOneFlow web service'.


Either the HydroDesktop database schema needs to be changed or the object model needs to be modified. Which of these two options would you prefer?

A) Create a simplified version of the object model which will be compatible with the current HydroDesktop database schemas.

B) Modify the HydroDesktop database schema to make the database more compatible with the object model. This would involve adding some new tables, removing some columns and moving the TimeUnitsID foreign key from Variables to the DataSeries table.

Any suggestions from the team will be appreciated.

Dec 1, 2009 at 8:09 PM

I haven't decided which approach I think is better, but I think the following things should be considered.

1. The more generic elements of the object model, such as multiple qualifiers per data value, are there to accommodate real examples of how data are being served.  For example, if a value is "estimated" (e) and also "approved for publication" (A) by the USGS, the data value gets both an "e" and an "A" qualifier (sent over the wire as "Ae").  Now, I don't think it's feasible to create a model that's generic enough to support every single case of how data are currently being served online.  However, two of the services that are motivating this generic object model design happen to be hosted by the USGS and the EPA, two of the nation's biggest sources of water data.  It seems like it'd be a good idea if we could fully accommodate them.

 

2.  Whatever solution is chosen, I think it has to be easy for plugin developers to understand and work with.  If handling multiple site codes and qualifiers makes things very difficult to work with, even with a fancy abstraction layer, then I don't think you'll have many developers contributing to the project. 

 

Dec 7, 2009 at 10:35 PM

DataServiceInfo has a ServiceTitle property, but ServiceTitle is not an attribute in the metadata database schema (as far as I can tell).  Should it be?

 

Dec 7, 2009 at 11:03 PM

The ServiceTitle property is only used for services registered by HIS Central. This is why it is not included in the MetadataCache database schema. The ServiceTitle is meant to be a brief description of the web service which is entered by the web service manager person when he registers his service at HIS Central.

 

Dec 8, 2009 at 1:28 AM

Having seen the AddServiceForm in the metadata fetcher plug-in, I think it makes sense to add the ServiceTitle column to the MetadataCache database. This won't break any existing other plug-ins because right now only the metadata fetcher works with the Metadata Cache database.

I have added the ServiceTitle field to the DataServices table in the Metadata Cache database (in /Databases/MetadataCache.sqlite)

 

Dec 16, 2009 at 9:28 PM

What's the design philosophy regarding what does into the cache database?  I'm looking at the one currently on SVN, and the DataSeriesCache table has QualityControlLevelID of 1, but the QualityControlLevelsCache table doesn't have any records in it.  This causes the MetadataCacheManager class to throw an error in DeleteRecordsForService, because it NHibernate can't find the quality control level. 

I can program  the MetadataCacheManager to ignore null values, but the question is, should I?  Or is it better to throw an error so that the user is alerted that something is wrong with the database. 

Dec 17, 2009 at 5:56 AM

The metadata cache database should simply be able to support searches based on our 5 point search criteria 1) where, 2) when, 3) what, 4) services, 5) ? hmmm can't remember the fifth one. In any case, it's probably OK to drop the QualityContrlLevelsCache since it's not required for any of these search criteria... 

Dec 17, 2009 at 7:58 PM

I think I know what is causing the error with DataSeriesCache.QualityControlLevelID = 1 and no records in QualityControlLevelsCache table.

I have an error in the MetadataCacheManager.DeleteRecordsForService() method.

  1. A list of series from DataService A is saved to the database.
  2. A list of series from DataService B is saved to the database. Some series from DataService B have the same QualityControlLevel as the series from DataService A.
  3. Later, user chooses to delete DataService A.
  4. the method DeleteRecordsForService(DataService A) is called
  5. In the method DeleteRecordsForService, all series including their variables, sites, methods, sources and quality control levels associated with the service are deleted.
  6. As a result, some rows in DataSeriesCache from DataService B have an invalid QualityControlLevelID.

I thought NHibernate and SQLite wouldn't allow deleting the rows in the QualityControlLevelsCache table in this case, but testing shows that's not true.

A quick solution could be:

In DeleteRecordsForService, only delete rows from DataSeriesCache, SitesCache and VariablesCache. Don't delete the rows from MethodsCache, SourcesCache and QualityControlLevelsCache.

Jan 4, 2010 at 4:42 PM

Dan, the five items that define a series in ODM are:

Site, Variable, Method, Source (not necessarily the same thing as "Data Service"), and Quality Control Level

 

I think QC Levels from service A and service B should have different QualityControlLevelIDs in the database.  I'm guessing what's happening now is, if service A includes a QC ID of 1, and service B also has QC ID of 1, then only one QC level gets written to the database, and it has a QualityControlLevelID of 1.  What should be happening, is that a unique QualityControlLevelID should be assigned for each new QC level.  I think we need this because it's possible that service A's QC ID of 1 could mean something different than service B's QC ID of 1 (e.g., "Approved" vs. "Outliers Removed").  Also, we should use the OriginalQualityControlLevelID field to store the QC ID that was returned from the web service.  Given the example above, the resulting QC table would look like this:

QUALITYCONTROLLEVELSCACHE

QualityControlLevelID  OriginalQualityControlLevelID  Definition

1                                                  1                        Approved

2                                                  1                        Outliers Removed 

 

The same rule should apply for Methods, and anything else where we record IDs from the data service query results.

Jan 4, 2010 at 5:08 PM

I noticed in the Search plugin, in SeriesConverter.AddToFeatureSet, there's a line like this:

row["wsdl"] = series.Source.Link;

The link in the Source description for a series isn't necessarily the link to the web service.  Often it points to the organization who is the source for the data or a website about the project that generated the data, such as http://littlebearriver.usu.edu/. 

I think we need some way to associate a series with a data service in the object model.  Perhaps a series object can have a ServiceId.  Or perhaps it can have a DataServiceInfo property that returns a DataServiceInfo object.  This would allow us to change the line of code above to the following:

row["wsdl"] = series.DataService.EndPointUrl;

Also, I think DataServiceInfo needs a ServiceId property which stores the internal database ID for a data service. 

Thoughts?

Jan 4, 2010 at 5:31 PM

I see what you mean. Basically since we don't have a controlled vocabulary for quality control levels, then we have this problem that two services may be using the same ID for different control levels... this will need to be fixed at the HydroDesktop level it seems...

Jan 5, 2010 at 9:53 PM

Regarding the DataServiceInfo.ServiceId property, it looks like DataServiceInfo.Id is serving this purpose, so at least that issue is resolved.

 

Jan 6, 2010 at 6:35 PM
Edited Jan 6, 2010 at 6:37 PM

If a series was created from the metadata cache, then it can be cast as a SeriesCache object which has a DataService property that gives you a DataServiceInfo object.  So, I think we're ok in getting the URL from local cache searches.  I think the way we tie it to series from HIS Central services could still be improved.