© 2020 Medic photo archives
By Mitali Ayyangar, Matthew Harris, Caitlin Augustin, and Ivy Wang
Background
Data is revolutionizing how people access health services. For community-based care providers, the ability to target treatments has the potential to be transformative. Digital tools in the hands of Community Health Workers (CHWs) make it easier to collect routine health information closer to where people live, understand their health needs better, and treat them faster, saving (more!) lives. Such routine and accessible digital health information is critical for supporting CHWs to lead the charge in combating pandemics, infectious disease outbreaks, and other public health emergencies, particularly in places where community health services account for the majority of primary care visits.
In order to provide first-class care to all people, there’s a need by actors to build and maintain confidence in data collected in the service of these public health priorities. A common pain point is that the data collected at the frontlines of care are still considered unreliable for informing decision-making. Improved data quality is essential to countering this mistrust and realizing the potential of community health data to increase access to quality healthcare and advance universal health coverage. This reality has been the source of recommendations and guidance from multilateral organizations, national governments, and a host of subject matter expert entities as a clear need1.
It’s important to note that far from being unique, concerns about data accuracy, completeness, timeliness, and other dimensions of data quality are ubiquitous in any human-collected data system and especially as datasets (about the public and by the public) become more diverse and large-scale2.
Technical Exploration and Human-Centered Design
DataKind began exploring solutions for improving data quality as one part of our Frontline Health Systems portfolio in 2019. In collaboration with our partner, Medic, DataKind has explored opportunities to disrupt the compounding cycle of poor data quality found to be pervasive in large deployments of digital data collection tools and digitally-enabled community health information systems3.
Building on the prototype developed in 2020, DataKind and Medic have progressively identified sustainable pathways to increasing trust in public health data through changes in tooling, training, and communication.
DataKind’s contribution to building a responsible digital health solution as part of a powerful consortium called iCoHS (Intelligent Community Health Systems) – designed to support Uganda’s Ministry of Health (MoH) in its efforts to use and nationally scale digital health solutions also provided the opportunity to determine how best to meaningfully support health system and community health managers. Through this engagement, DataKind and Medic led a human-centered design user-experience research with a number of NGO and MoH stakeholders – from the community to national levels – to identify data quality problems commonly encountered (or perceived as being ubiquitous) and high priority scenarios that have the potential of creating compounding problems.
The findings informed DataKind’s solution-design as one that must be sustainable, part of a well-maintained software infrastructure and a global good. In the latest phase, we wanted to develop a solution that is:
- Reusable (i.e. can be used for monitoring data consistently and on an automated cadence)
- Accessible (easy to configure via a web-based User Interface (UI) thereby reducing technical skill demands for configuration)
- Flexible (able to detect more than classic data quality scenarios, for example, problems with protocol adherence)
Introducing the Data Observation Toolkit (DOT)
The Data Observation Toolkit (DOT) is an open-source, community-informed toolkit capable of automated monitoring and detection of inconsistent or problematic data in a relational database. DOT is designed such that it can sit as close as possible to the point at which CHW-gathered data syncs with the server and enters the database – at which point a series of tests can be applied.
At its core, DOT uses two powerful data integrity and validation libraries—DBT and Great Expectations. DOT builds off out-of-the-box tests from both libraries to support classic data quality scenarios and common scenarios related to the community health domain – such as specific protocols for the community case management of childhood diseases (malaria, pneumonia, and diarrhea) and maternal, newborn, and child health.
DOT also provides a simplified UI as a management layer where tests can be easily configured and results are saved to a DOT database so that data integrity over time can be tracked. One of the most advantageous features of DOT is that it can be deployed to monitor multiple databases and it comes with a Docker build for easier deployment.
DOT is highly configurable via a web UI and has the ability to monitor multiple databases
DOT monitored data and the resulting high quality datasets increase trust and confidence in community-generated data which opens up critical pathways for the aggregation of this data into district and national level digital health information systems. It’s also setting the foundation for the future of innovation in community-based care delivery – including the use of artificial intelligence (AI) and machine learning for predictive analytics and precision public health.
The Road Ahead: DOT as a Digital Public Good
Digital Public Goods (DPG) are free and open-source software, data, AI models, and more that adhere to best practices, do no harm by design, and help attain the Sustainable Development Goals. While DOT’s origins are in strengthening digitally-enabled community health information applications, the tool is highly configurable in that it can be used to monitor data in any relational database, making it amenable to scaling across platforms and domains.
DOT is already free and open-source, and through sustained support of key donors, DataKind is able to commit to the further development of DOT per the Digital Public Goods Alliance guidelines. Through our partnership with Medic, DOT integration is already available on the Community Health Toolkit, a leading open-source platform for digital health and advanced community health systems used to support tens of thousands of community health workers across 16 countries.
We’d like to thank our partner, Medic, and Chrisgone Adede, Mourice Barasa, and Henok Alemayehu in particular for their support and collaboration. We’d also like to thank Matthew Harris (Data Ambassador), Katy Moore (Project Manager), and Data Experts: Lorenzo Rubio, Lydia Sanyu, Rahul Ragunathan, Iria Enobakhare, and Tyler Dorland for volunteering their time and expertise to DataKind and to the development of DOT.
DataKind is an ally of the Community Health Impact Coalition that works to ensure community health workers are professionalized, trained, compensated, integrated, and recognized for their contribution to providing first class healthcare to all people.
2Shazia Sadiq. 2013. Handbook of Data Quality: Research and Practice. Springer Publishing Company, Inc.
3Caitlin Augustin, Isaac Holeman, Erika Salomon, Helen Olsen, Phil Azar & Mitali Ayyangar (2021) Pathways to Increasing Trust in Public Health Data, CHANCE, 34:3, 24-32, DOI: 10.1080/09332480.2021.1979808
Join the DataKind movement.
- Interested in supporting our work? Donate here.
- Interested in sponsoring a project? Partner with us.
- Interested in volunteering with DataKind? Look no further.
- Interested in working at DataKind? We’re hiring!
- Interested in submitting a project? Go for it!
Quick Links
- Pathways to Increasing Trust in Public Health Data
- Empowering Health Worker and Community Health Systems: Data Integrity and the Future of Intelligent Community Health Systems in Uganda
- Engineering Scalable Data Quality Assessments for Frontline Health with Medic Mobile
- Strengthening Frontline Health Systems with Data Science & AI: Updates From Our First Cohort of Projects