“Data Science” Volunteer Projects & Apprenticeships


The day will start with short project presentations where speakers will discuss their goals, current status, and needs, and then once all the presentations have been made we will break for lunch. We will also present a longer list of possible “open data” projects.


After (and/or during) lunch we will get together and (1) experienced analysts and programmers can immediately begin to work on projects, and (2) there will tutorials that tie into project goals both for beginners and experienced folks who would like a refresher.   Participants will learn relevant “Data Science” skills that then can be applied to projects.   Depending on the interests of participants these tutorials and hands-on project activities could be held one after another; or in parallel. We will then begin working on the various projects up until there is a good stopping point, or around 5-6 pm when Microsoft closes.


SaturdayTech ProjectPresenter / SlidesAfternoon Tutorial and/or Goals
9:00 - 9:15Overview of "Data Science for Social Good" Programs: It’s an exciting time to be a data scientist that cares about social impact. Several DSSG programs have sprung up around the globe. Peter will walk us through the multiple efforts occurring at universities, government, non-profits, corporations, and communities—and talk about how you can get involved.Peter Bull
of DrivenData
Data Science for Social Good Slides
The tutorial will be instructional by walking through a project from start to finish.
9:15 - 9:30

HubEvents: is a weekly calendar where local events pertaining to Climate Change and other social good topics are manually aggregated. The calendar draws from several organizations; however there are dozens upon dozens more that are not included due to the manual process of gathering event data. A new project using Python for web scraping and pulling data from various APIs is underway to automate certain parts. When the project is completed, for the first time ever it will be possible to see what is going on at all the universities and institutions in the Boston area. People concerned with energy issues will be aware of every talk and event in the city, the mathematics student will know about all the lectures and seminars.George Mokray of
HubEvents and
Jon Halverson, Volunteer Python Programmer
Web Scraping Slides
A tutorial on "web scraping" will be covered. Event calendars will then be scraped and aggregated together.
9:30 - 9:45Boston's "Data Science for Social Good" Ecosystem: This talk describes the Meetup Tech Ecosystem leveraging data from Meetup API. The data was analyzed using R and various algorithms including social network analysis. The Meetup analysis is complementing an ongoing effort to bring the Meetup Tech Community together with the multitude of government agencies, non-profits, university departments, community groups, and industry associations. This effort is both local, national, and to some extent global.John Verostek of DSSG Meetup group
Overview Slides
API Overview
A tutorial on pulling data via an API will first be done. The goal is to understand more about Boston's Social Good ecosystem
9:45 - 10:45Identifying Public Spaces for Innovation using City Software Development Kit (SDK) from the U.S. Census . "Data Science for Social Good" Fellows at the University of Chicago have created datasets and a tool for community organizers using the CitySDK to understand local communities. Alexandra will walk through this project that leveraged Google Fusion Maps. One goal will be to understand to what degree this project can be adapted to Boston.Alexandra Barker of the U.S. Census Department
US Census Slides
City SDK

City SDK Gallery
Census Module
This project ties into both of the above. The Census API will be used to understand more about Boston's Public Innovation Spaces.
10:45 - 11:00
11:00 - 11:10Crowdsourcing Information on the Quality of Health Services in Emerging Markets: Tulalens is a social enterprise with a vision to transform under-served communities from passive recipients of critical services to knowledgeable consumers. Simply put, we're a Yelp for under-served communities. We've manually demonstrated that pregnant women in urban slums in India made better decisions regarding where to seek healthcare when equipped with crowdsourced information. We're now building a technology platform that can automate data sourcing and dissemination allowing us to scale to other geographies, demographics, and services.Priya Iyer, Founder of TulalensPriya has a dataset covering the people whom she is trying to help. She needs a hand in understanding more about patterns and trends in her customer dataset such that in turn she and her team can provide better services to the women they help.
11:10 - 12:00
Lunch / Pizza
12:00 - 5:00: Working together on projects
Boston Innovation Spaces & Social Good Ecosystem
John Verostek
George Mokray of HubEvents and
Jon Halverson, Volunteer Python Programmer
Crowdsourcing Health Services Information:
Priya Iyer
DrivenData Project:
Peter Bull