Create a data validation and verification plan (it needs to be a strict data integrity guideline that will be a successful approach) that strictly adheres to analyzing incoming reports from labs to data that has been integrated in the analysis table in SQL. The goal of this is to ensure that the reports that are being submitted month by month, has the same information as what is being integrated in the analysis table. Please see the instructions below:
This task is envisioned to be a long-term aspect of the overall work responsibilities as part of the NFLIS data management team. For 2025, the task would be oriented exclusively around NFLIS-DRUG. For our purposes, validation and verification entails the development of systems to measure the reliability with which laboratory analytical results are reflected in NFLIS data available for analysis. The following issues can impact this reliability:
Laboratory processes, including analytical processes, LIMS systems, data inputs, data definitions, data submitted to the laboratory, data extract processes, data gaps, laboratory participation, and other variables.
Data management prior to processing
Data ingestion through SSIS, including mapping of data fields, data conversions, data manipulation, manual manipulations, and automated manipulations
Data processing using SQL, including mapping of incoming data fields to NFLIS data elements, conversion of structured and unstructured data to NFLIS codes, substance processing, date processing, geospatial processing, metadata processing, case/submission/item processing, exceptions, manual manipulations, unintended processes, and other SQL manipulations.
The data can also be affected by Oracle processing and data extraction queries, but those elements are outside of the current scope of data management.
We seek to provide a complete and reliable view of all seized drug analyses in all forensic laboratories. The validation and verification task measures the quality of our data management by examining the data inputs from contributing entities and the results reflected in NFLIS-Drug. Your work is not designed to be comprehensive in this regard because the DEV team will be conducting checks based primarily on queries of the NFLIS-Drug data itself and our view of the elements related to item 1 above is necessarily limited. The best approach is to compare incoming data to NFLIS-Drug to determine the extent to which we can judge the overall reliability of our data management. The validation and verification should keep in mind all of the variables in items 1 through 4 but start with a “black box” view of the problem. In other words, we should start with the following products:
Does NFLIS-Drug correspond to the input data elements from the contributing entities for case, submission, and item information?
Does NFLIS-Drug correspond to the input data elements for:
Substances
Dates
Location of seizure (or other location as provided by the entity)
Metadata elements (quantity, form, purity, etc.)
Do submitted data files correspond to our expectations for data for each of the elements?
For example, is the location the location of seizure? Are only confirmed substances provided? There are many other questions that can be asked, and these will flow into our engagement with individual contributing entities/labs and LIMS vendors.
We would like to have this data on an entity-by-entity basis.
We would like to have this data in real time to monitor our progress as a program.
The data should be used to inform the following objectives:
A complete picture of the reliability issues as outlined above
Data management process improvement
Data management and processing troubleshooting and root cause analysis
Data processing improvements
Modernization
Entity-by-entity engagement to improve the value of data provided to the program
Strategic engagement with the field to promote analytical and data standards that improve the quality and reliability of data