Data theft has become a serious issue for organizations for quite some time and the worst part is that it took long time for us to identify the theft. The longer it takes to identify the theft the harder is to find the solution. In such serious matters Hadoop and Big Data can help organization in identifying the theft and find the solution in short span of time.
There are some solutions present in the market about data theft solution but still it is a long way before we can come up with something sound to protect our data against the theft.
Reputed organizations have been the victim of these data thefts from time to time and in return have suffered the huge loss. Following statistics will shed some light on the matter:
For over 8 years in US alone hackers have targeted the banks, departmental stores and payment processors and stolen millions of credit card details and debit card numbers.
KT Corp, suffered the huge reputation loss when two suspects sell their data plan details and contact information of millions of its subscribers.
Even one of the world’s biggest data monitoring company Experian, was not spared and disclosed a huge data breach in which data of customers who have applied for T-Mobile services was stolen.
76,000,000 customer records were stolen by the hackers from the JP Morgan chase which included their account numbers, email IDs and names and the theft was detected after one month.
These are some few incidents and many such incidents are happening every day around the world and some conclusions regarding the data theft are:
Data theft cannot be prevented but such incidents can be managed better.
Data theft is able to breach the strongest of system as data theft methodologies are evolving with the technology.
If companies like Experian and JP Morgan are compromised then almost nothing is secure.
Data theft protection needs to focus on other dimensions as well and not only on protecting the data.
Data theft can strike anytime and anywhere and it is almost impossible to have 100 percent security against it but the approach towards the data theft needs to be modified. Hadoop and Big data can play important roles in identifying the data theft. Few companies are also working on the finding the reliable data theft solution. Take a note that they are not even trying to prevent the data theft and they are working on two things:
Identifying the data theft as quickly as possible so that without wasting any time data could be tracked.
Tracking the stolen data on internet and on dark internet.
It is impossible to stop the data theft is the assumption behind the data theft solutions. The best approach is to assume that data theft is inevitable and one should start looking for the data before it is lost. There is a subtle difference when it comes to stealing the data v/s stealing tangible goods. Unlike tangible goods a thief can only stole the copy of the data and the original data can help track the copy of the data. It’s all about comparing the original and its copy.
In order to match the original with its copy, one need to generate the hash of the original and compare it against the copy of the data for a given chunk of data the hashing algorithm will always generate a unique hash. In simple words it is not a secret code which is hidden in the data, it is actually a computation done on the data itself in order to create a hash. The original data is divided into chunks and then each chunk is passed through a mathematical function which runs the computation on the chunk and generates the hash. After this you crawl the web match the generated hash against the data found on the web, if the two hashes match then congratulations you just found your stolen data.