By DataKind San Francisco
DataKind San Francisco has a lot to celebrate. Over the past year, we’ve partnered with six nonprofit organizations to augment their strategies and services with data science: Empower Work, Glide Foundation, Hello Sunday Morning, Muso, UFW Foundation, and Worldreader. These collaborations have allowed us to be of service in areas ranging from homelessness and public health to mental wellness and global literacy.
National Volunteer Week is a timely reminder that we owe everything we’ve accomplished to the tireless efforts of our volunteers. Their skills, passion, and grit drive every step in the lifecycle of our projects. To all the volunteers who scoped the projects, wrote the code, advised our nonprofit partners, and helped build the DataKind San Francisco community — thank you!
A couple of our volunteers graciously offered to share their experiences leading projects at DataKind San Francisco. Below are their stories about employing data science for social good.
Justine Schott
Justine is a Data Analytics Manager at EducationSuperHighway. At DataKind San Francisco, she served as a Data Ambassador for our project with Muso. Muso’s mission is to eliminate preventable deaths rooted in poverty.
Can you tell us a little bit about yourself and your background?
I’ve always been drawn towards the magic of statistics. I started my career as an actuary, and I was in awe when probabilities and models could make an overwhelming amount of information fit into a simple package. When I was ready to learn a new industry, I signed on with EducationSuperHighway (ESH), a nonprofit organization that connects public schools to high speed internet. Over the past six years of working with ESH, we’ve accomplished our ambitious mission and will be sunsetting in 2020. I plan to continue to bring a positive impact to our community through data-driven insights.
Can you briefly describe the project that you led as a Data Ambassador?
Our partner organization, Muso, is researching the effectiveness of proactive community healthcare in Mali through a three-year study. Muso realized their formula for creating a primary key, which was based on family and location, was not consistent through the years because insecurity in Mali necessitated families and locations to change. Muso needed DataKind’s resources to create a probabilistic record linkage model to link patients through the years and ensure the success of their study.
How did your project stretch your skills as a data scientist and a leader?
Having never built a record linkage model, I was excited to build a model to solve a problem that was new to me. In my experience, I only had the opportunity to learn something so brand new when I moved industries. I spent my time reading docs and scholarly articles to learn about best practices, and assembling the right team of volunteers to get the job done. In implementation, I had to learn about new Python packages and best practice techniques to make efficient and accurate code. There were many exciting challenges to overcome throughout this project where I didn’t always know the answer, but we figured them out together as a team.
Do you have any stories of challenging moments from your project (and how you resolved them) that you would like to share?
Our team had done amazing work to build a model from a sample of data. However, we were unable to build a model on the entire population on our machines due to the computational resources required to make a Cartesian product. It was a high priority for this script to have the ability to run locally, so that it would be accessible and at no cost for our partners at Muso. We worked with Muso to refine our blocking criteria, which improved the efficiency of the Cartesian product and made the script able to be run locally.
What was the most interesting fact you learned about challenges that nonprofits face?
Initially, our model resulted in an average of eight matches per patient. Although we all agreed this was a success given the data available, we agreed an alteration was necessary to move the project forward given the resources available to review matches. To overcome this resource constraint issue, we modified the model to automatically pair patients with multiple matches to the match of the original primary key. This resulted in a drop to only one and a half matches per patient overall. This made me realize how resource constraints impact nonprofits holistically – even in unexpected areas, like which data modeling the nonprofit will utilize.
Jaya Pokuri
Jaya is a Data Scientist at Windfall Data. At DataKind San Francisco, he served as a Data Ambassador for our project with Hello Sunday Morning. Hello Sunday Morning’s mission is to change the world’s relationship with alcohol.
Can you tell us a little bit about yourself and your background?
I’m originally from Raleigh, NC and followed an educational path in biomedical engineering. However, after graduating, I decided to pursue data science and moved to the Bay Area where I’ve since been working as a data scientist at various startups. I look to be part of small, focused teams working on innovative and impactful projects as that’s how I believe the most change can be created.
Can you briefly describe the project that you led as a Data Ambassador?
As a Data Ambassador, I led a project with Hello Sunday Morning, an Australian-based charity that aims to change alcohol behavior in the world. Hello Sunday Morning developed an alcohol support app called Daybreak where members can communicate and support each other online through posts and comments. However, there are times when problematic posts/comments arise where members indicate potentially harmful behavior. Currently, moderators review all posts and comments and will escalate potential problematic ones to a clinical staff. However, as the platform grows and more members join, it’s getting more difficult for moderators to keep up with the increasing number of posts/comments.
During this engagement, my team worked on creating a Natural Language Processing/machine learning-based solution that helps moderators in identifying problematic posts/comments by outputting, for each post, the probability of risk as well as flags if community guidelines were breached. The solution was delivered through a Flask app for Hello Sunday Morning to utilize in production.
Were there any moments during your project that made you feel particularly proud?
I’m happy to have had the opportunity to work on this project with very dedicated members of the Hello Sunday Morning team and DataKind volunteers. Seeing this project through from ideation to implementation, with positive outcomes along the way, has made me proud to have been part of the team that made it happen.
How did your project stretch your skills as a data scientist and a leader?
Engaging in this project with Hello Sunday Morning allowed me to leverage my skill set in NLP more so than projects I’ve worked on in the past. Hello Sunday Morning has a rich text dataset, which led to interesting data exploration and analysis. In addition to the technical aspect, this project built my skills as a team lead by allowing me to head the direction of various volunteer data scientists/analysts. More specifically, leading logistical planning and communication involved with forming hypotheses to tackle, deciding on which ones are worth investigating, and determining how to allocate tasks among team members.
Has the experience influenced your career trajectory?
Being a part of the DataKind team and projects have given me appreciation for the impact data can have on social issues. Moving forward in my life and career, I hope to follow a path that allows me to utilize my data science skills for social good and positive change in society while working with like-minded people.
Seward Lee
Seward is a Data Scientist at Oracle. At DataKind San Francisco, he served as a Data Ambassador for our project with Glide Foundation. Glide’s mission is to alleviate suffering and stabilize lives.
Can you tell us a little bit about yourself and your background?
I started my career as an economic consultant in San Diego, specializing in intellectual property valuation and litigation. My current job involves building analytics and machine learning solutions to help utility companies improve how they offer their water, gas, and electricity services.
I’ve been helping out at DataKind San Francisco since the 2016 DataDive, where I was part of a team that analyzed Conservation International’s wildlife data. I love that working with DKSF allows me to collaborate with amazing data scientists and nonprofit partners eager to drive positive social impact.
Can you briefly describe the project that you led as a Data Ambassador?
Glide is a charity organization that provides services to the homeless and marginally-housed community in San Francisco. As a member of San Francisco’s End Hep C SF Project, Glide tests populations at highest risk of Hepatitis C infection and guides people who test positive for the disease through the treatment process. Our collaboration with Glide involved performing analytics on their diagnostics and treatment data to understand patient progress towards recovery from the disease.
After a couple rounds of project scoping and data processing, my co-Data Ambassador, Akshaya Vardhan, and I built a team of nine volunteer data scientists to analyze Glide’s data at a DataDive. During the DataDive, we answered a variety of questions that Glide had and built a dashboard for them using Google Data Studio.
Were there any moments during your project that made you feel particularly proud?
Two moments in particular really resonated with me. The first occurred during the last day of the DataDive. A member of our team told me that she really enjoyed working on the project, and that the DataDive made her feel like she was “participating on MasterChef”. Doing good with data is serious work, but I love that the team had fun in the process.
The second moment was during our debriefing call with Glide a few months after the project’s conclusion. I learned that the project sparked fruitful conversations about what it means to be an effective learning organization, and even inspired some folks within Glide to consider learning Python!
How did your project stretch your skills as a data scientist and a leader?
This project taught me a lot about the human side of data science work (and work in general). I’ve learned important lessons about how to build a good team, how to balance efficiency and empathy in project communications, and how to set the right expectations among all stakeholders.
What advice would you like to share with volunteers who are new to DataKind or the Data for Good movement?
The North Star of your project should be what impact it will have on your nonprofit partner’s strategy and operations. Be wary of falling into the trap of simply prioritizing the hottest technology or your favorite tool when you build your solution. Seek to deeply understand the day-to-day work of your nonprofit partner because their needs should always be the main driver of your project scoping process.
Join Us
Interested in learning more about our work? Please check out the DataKind website or follow DataKind San Francisco on Facebook, LinkedIn, and Twitter. If you’re interested in collaborating, we’d love to hear from you!