Why Data Provenance Could Be the Next Big Thing for Cybersecurity

Natalie Parra-Novosad
Natalie Parra-Novosad
Marketing Manager
Data provenance1

Data provenance is the documentation of the origin of a data product or piece of content. Data provenance answers questions of where a particular piece of information came from, what tools were used to create it, and whether or not it was modified over time. This information is often stored in meta data and can be used to verify a data product’s authenticity.

Data provenance isn’t a new concept. It can be compared to the centuries-old process of art authentication. Certificates of authenticity and records of provenance are used to indicate a piece of art is the genuine product of a particular artist. Information provided in these certificates might include the materials and tools used to produce it, the date it was completed, and its history of ownership. Sometimes forensic analysis is used to verify a work’s authenticity. Data provenance consists of similar information. Today, the verification of data authenticity is a growing need to mitigate disinformation campaigns and to prevent the manipulation of data.

Where We Are Today

Most of us are familiar with “fake news.” We can find the roots of fake news in political propaganda going back centuries. Today, disinformation travels nearly at the speed of light, and purveyors on the dark web make it possible for anyone to engage in its creation and distribution. During the 2016 election season, over 25% of voting age adults visited at least one fake news website, and the top 20 fake news articles were more widely shared on Facebook than the season’s top 20 hard news articles. Bots contribute greatly to this spread of disinformation. According to security firm Imperva, 29% of web traffic in 2016 consisted of malicious bots. A recent study from MIT found that fake news spreads 70% faster on Twitter than true stories. When people don’t investigate origins of articles before sharing them, disinformation campaigns thrive.

In the future, disinformation and data manipulation could be used to attack nation states with more severe and even life-threatening consequences, such as the manipulation of stock market data, changing data that shows chemical levels in a water treatment plant, or the manipulation of patient data in hospitals. Cybersecurity experts predict data manipulation will also spread to the corporate world. In some ways, it already has.

Data Manipulation in the Corporate World

Bots and disinformation campaigns are already being employed to distort corporate communications. Josh Ginsberg, CEO of Zignal Labs reported that people are using fake accounts to amplify false or negative news stories about companies and impact their bottom line.

As techniques for data manipulation evolve, cybersecurity experts can see how it would be used to change the course of a company’s decision making. Bob Ackerman, managing director for AllegisCyber, explains a scenario to Xconomy where hackers not only steal data from a company’s systems, but they also manipulate data, causing the company to take a different direction than they normally would. Xconomy also reports Greg Dracon, an investor in cybersecurity startups, has already heard of targeted cyberattacks to manipulate data. One in particular involved modifying a company’s financial documents to try to influence negotiations in an acquisition deal.

As machine learning becomes an integral part of corporate research and reporting, we must be especially careful about the authenticity of data. As Bob Ackerman says, “Machine-learning systems are only as good as the data that they are trained with.” Once tools that utilize machine learning are infected with bad data, they could generate volumes of misleading reports and forecasts.

Getting Ahead of Data Manipulators

In the next few years, we will likely see a rise in the development and popularity of software tools that track data lineage, as well as social media monitoring tools that identify bot activity and fake accounts. For example, Xconomy reports that Joseph Witt, a former lead developer with the NSA, is developing tools that automatically generate provenance data, including all systems that touch the data. They also register the timing of each transfer and validate the authenticity of logs. Additionally, there are new media monitoring tools out there (such as those developed by Zignal Labs) to track bot activity and inauthentic conversations around brands and organizations on social media platforms.