The success of political campaigns depends upon their ability to build up, communicate, organize, and empower communities. For political elections, the community is the voter base, and it already exists. There is very little time and money for community development activities. All effort is immediately focused on understanding the voter base, developing a plan for working with it, and implementing that plan.
The ability of campaigns to effectively leverage state and national voter files is critical to establishing a data-driven political strategy. The voter file accounts for all registered voters at the state, county, and precinct level and gives the campaign all of the details needed to get to work. In their final formats, voter files look very basic and structured. But because they are comprised of extreme levels of data diversity, creating these valuable files is a monumental data challenge.
Data diversity is a good thing. It brings more granularity to your discovery, deeper insights, and more accuracy, but the more data diversity in an analytics project, the greater the challenge of transforming and blending that data into something useful. (This process of making diverse data useful has been widely referred to as data preparation or data wrangling.) The diversity of the raw source data that is used to build state and national voter files comes from the diversity of the data origins. Every state produces voter data differently, and within each state, each county and precinct handles and reports voter data differently. The diversity of types, formats, standards, definitions, completeness, and quality levels is difficult to grasp when the desired output needs to be universally standardized. In the end, all state and national voter files must be standardized across the board.
Jim Harrold explains how his team is able to wrangle enormous data diversity challenges in order to power the ever-changing political landscape. Specifically, Jim discusses his team’s experience transitioning from a process for standardizing this data using custom built scripts that were patched over for many years to a more intuitive and scalable approach with modern tooling for data preparation. In the former approach, the scripts for preparation work were only truly understood by the two people who developed them, making it difficult for others to participate or for the team to grow. Scale was difficult to model or achieve, and quality was uncertain. After transitioning to a modern, tooling-based approach, all users can seamlessly move from state to state and easily understand the recipes for generating each voter file. New users can quickly learn the tools and jump in the game at any point. The tool scales as much as your team wants, and all output has quality assurance checks and balances.
Jim Harrold is NationBuilder’s data services engineer, which puts him at the intersection of big data, public service, and politics. Previously, Jim worked at Project VoteSmart and the University of Nebraska Medical Center, where he conducted political research, collected health data, and researched Congress members. Jim holds an undergrad degree in political science from the University of Nebraska-Lincoln and a master’s degree in international relations and affairs.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.