DATA IS A UBIQUITOUS FORCE IN YOUR daily life, whether you see it (or know it) or not. When people are aware of data’s influence, they’re mostly aware of the negatives. Whether it’s Cambridge Analytica’s misuse of Facebook user data or the massive breach at the consumer reporting company Equifax, the operations and machinations of “big data” are usually viewed with skepticism and distrust.
That’s where Data for Democracy is helping bridge the gap. The organization is focused on creating an extensive decentralized community of data scientists, allowing them to collaborate on data-related problems. It allows technologists to partner with institutions or individuals working on solutions to pressing, complex problems. Since its launch in December 2016, the organization has blossomed into a network of over 3,400 volunteers from across the world, working on a wide range of issue areas and projects.
Jonathon Morgan, the software engineer and data scientist who founded Data for Democracy, didn’t always appreciate the ways data can be used for positive impact at the human level. But in 2013, when working with a nonprofit in Kenya, he began to more fully consider how data can actually help people, rather than just quantify them.
“I really wanted to understand how we can start to use data to improve our understanding of how human beings operate,” says Morgan. “This sounds pretty straightforward and obvious at this point, but at the time there was still a sense that data science was pretty much for optimizing marketing metrics, online advertising clickthrough rates, or [for] quants on Wall Street. That was the arena of data science.”
MORGAN RECALLS THE IMPACT OF THE OFFICE of Science and Technology Policy under the Obama administration, which brought a number of Silicon Valley leaders into the White House, turning their focus to more civic-minded projects. This included people like the mathematician and computer scientist DJ Patil, who helped coin the very phrase “data scientist.” Technologists were encouraged to embark on a “tour of duty”—six, 12, or 18 months—focused on finding solutions to problems.
For Morgan, what was most interesting about this was less the effect that any of those individuals and their projects had on government itself (though there were many). Rather, it was striking that they all came back to the technology industry and started introducing the idea that government and civic participation—using your knowledge and skills to have an impact on society—was really powerful, and actually prestigious, noble, and rewarding.
SHORTLY AFTER THE PRESIDENTIAL ELECTION IN 2016, Morgan and a couple of friends began Partially Derivative, a popular data science podcast. They were invited by Patil, at that point the soon-to-be-outgoing chief data scientist of the Office of Science and Technology Policy, to interview him at the White House.
The attention Morgan and his friends received as part of the interview, and the platform the White House had given them, made it seem like the best time to launch Data for Democracy. Morgan doesn’t see the project as partisan in any way, but he recognizes the 2016 election may well have been a catalyst for other people to get involved.
“I think a lot of people actually hadn’t thought about whether these institutions were really meaningful, and were now having their beliefs challenged by another point of view,” says Morgan. “So people who did feel like civil society, civil discourse, and participating and getting value out of having an impact through these institutions, that had been set up for a couple hundred years … people that valued those things realized that they had to be as active, engaged, and participatory as the folks they disagreed with.”
DATA FOR DEMOCRACY DOES MOST OF ITS WORK via Slack, the communications application, and GitHub, where scientists and others can collaborate on open-source projects. Each of the 77 GitHub repositories it has developed corresponds to a project.
Morgan believes that a few projects stand out in terms of showing the impact Data for Democracy can have. One of these gets back to the founding ideas of Data for Democracy, rather than aiming for a specific outcome. In partnership with Bloomberg media and the data platform supplier BrightHive, Data for Democracy is working to create a code of ethics for technologists and data scientists, codifying some of the frameworks and intentions that went into developing how Data for Democracy would engage with the world. The guidelines, called the “Community Principles on Ethical Data Sharing” (CPEDS), cover data sharing and collaboration among data scientists. The guidelines include suggestions for overall practices surrounding the collection, storage, and distribution of data, understanding and minimizing bias in algorithms/models, and taking responsibility for how one’s research is applied. It also aims to identity and guard against potential areas of misuse.
Around the 2016 elections, there was a lot of talk about the vote being “rigged.” But it is incredibly hard to get local voter data: even though they’re public records, each state has different guidelines for submitting that information, and leaves it up to each individual district to figure out how to best comply with reporting requirements. The result is a jigsaw puzzle of unstandardized guidelines, making accessing the data incredibly arduous.
“So a team of 20 volunteers went around calling all the secretaries of state on the phone and getting them to fax the voter information to them, then scanning that data into PDFs, running them through software that tried to extract the relevant data, and then putting that into a usable machine format that could be shared with other data scientists,” says Morgan. “For months they hammered that process out and made it available to other researchers. It led to some interesting insights about changes in voter behavior, but more importantly led to a level of transparency that isn’t afforded to the public about election outcomes.”
Coordinating and setting up governance structures for an ever-growing and international community offers its own challenges, though, and it’s scaling up these areas that Morgan sees as the challenges of the future and one that’s integral to continuing Data for Democracy’s work.
“We have a core group, but we’d like to expand that,” says Morgan. “We’d like to be a small city rather than a small town.” With smart growth, Data for Democracy could become one of the world’s largest open-source software projects. And its potential to increase transparency and civic engagement worldwide is hard to overstate. Maybe big data isn’t so scary.