Many Asian countries, including Indonesia, are still living in an offline world. Most data are still in an analog form and are disconnected due to the lack of a single identity that could link them together. The two combinations made it very hard to analyze data for business or economic development. Most people don’t realize that, even today, most of data preparation process is still done manually and requires a lot of effort from a large number of data scientists to do it, which can be quite impractical.
Big data technology allows disconnected offline data to go digital and connect with one another, enabling companies and government to get a better view of individuals and other entities, contextually gain insights on what matters for them, and validate and optimize their spending and investments.
Mediatrac has been developing intelligent data preparation platform on top of Hadoop infrastructure using Apache Spark that combines Knowledge Graph and machine learning to automate the whole process and more quickly complete the process. Imron Zuhri shares several data connectivity use cases in the areas of marketing, sales and distribution, finance, telecommunication, healthcare, agriculture, and legal (in both the private sector and government) and explains how to leverage distributed computing to tackle massive entity recognition and resolution problems.
Imron Zuhri is the founder and chief technical director at Mediatrac, where he is responsible for herding the pack of nerds, the data scientist, and data engineers in the company. Together with his wife, Imron also established Erudio School of Art, the only democratic school of the arts high school in Indonesia. He has a wide interest in math, physics, astronomy, movies, music, photography, and literature, but first and foremost, he is obsessed with understanding human behavior, perhaps to compensate for his lack of social interaction.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.