One of the cornerstones of all financial companies is the protection of their customer information. Should this information be breached, not only would there be issues with a company's long-term reputation, but with directives like PCI DSS, HIPAA and Massachusetts' recently enacted privacy law.
While encryption could be used, this method of defense is expensive and renders the information in an unusable format. This leaves only one alternative to "obscure" the information: data masking. Data masking leaves information in a format that can be used for development and quality testing needs while providing enough valid information to service a company's customers. But what are the key considerations when companies begin looking at data masking? Let's look at some data masking best practices.
Determine the project scope
One of the first tasks required when looking at any data protection mechanism is to understand and set the scope of deployment. Data masking best practices require that companies know what information needs to be protected, who is authorized to see it, which applications use the data and where it resides, both in production and non-production domains. While this may seem easy on paper, due to the complexity of operations and the many lines of business most financial companies seem to have, identifying sensitive information, applications and the personnel who touch the information may require a substantial effort.
In addition, deciding if an employee is authorized to access customer information isn't just a "yes" or "no" decision. In the case of customer service representatives (CSRs), they may need access to some of the customer's information for verification purposes but not necessarily all of their information. For example, a CSR may have to ask for the last four numbers of the customer's Social Security/tax id number or their billing zip/postal code to ensure the customer is actually the person on the phone. While CSRs need this information to do their job, they probably don't need full access to the whole Social Security/tax id number or the full mailing address. Determining to what extent the information must be obscured, yet still be useful for business purposes, can be tricky and usually requires legal/compliance department involvement or review.
Decide on data masking techniques
The next step is to determine which data masking features will be used on the information. Data masking has many data manipulation capabilities, but not all are appropriate to maintain valid business contextual information. These capabilities include:
- Non-deterministic randomization: Replacing a sensitive field with a randomly generated value subject to various constraints to ensure the data is still valid like not giving February 30 days. An example would be, changing the date from 12/31/2009 to 01/05/2010.
- Blurring: Adding a random variance to the original value; for example, replacing a savings account value with a random value but within an 8% range of the original.
- Nulling: Replacing a value with a null symbol: For instance, replacing a Social Security number of 404-30-5698 with ###-##-5698.
- Shuffling: Shuffling the order of a value, such as changing a zip code of 12345 to 53142.
- Repeatable masking: Maintaining referential integrity by generating values that are both repeatable and unique. For example, replacing the tax id number 24-3478987 with 26-3245870 consistently.
- Substitution: Randomly substituting original values using a substitution table of values, such as substituting "Jane Doe" with "Mary Smith" from a list of 100,000 given names and surnames.
- Specialized rules: These rules are for particular fields such as Social Security/tax id numbers, credit card numbers, street addresses and telephone numbers that are structurally correct and used for workflow and checksum validation. As an example, substituting 100 Wall St., New York, N.Y. for 50 Maple Lane, Newark, N.J. where each random value -- house number, street, city and state -- make up a valid address and can be found using applications like Google maps or MapQuest.
- Tokenization: A special form of data masking where the algorithm used to mask the data is maintained so the information can be later restored to its original value. For instance, information stored that must later be recovered for disaster recovery purposes or when information must pass through untrusted domains between business operations.
Consider referential integrity needs
A data masking best practice, often missed during an initial data masking deployment, is the need for enterprise referential integrity. At the enterprise level, referential integrity is usually required to roll up information for line of business needs and resource sharing. This means that each "type" of information coming from a line of business application must be masked using the same algorithm/seed value.
For example, if a data masking system for line of business A's application masks customer birthday as 1/05/10, then the data masking system for line of business B's application must mask the same input value for birthday as 1/05/10. This allows an enterprise-level application that needs to have each masked birth date correlate and act upon the rest of the data coming from the two lines of business applications. If this "workflow" approach to masked information is not considered at the initial or even the second data masking tool deployment, substantial alignment and re-masking of information will be required unless the financial company has little interaction between lines of business, which is generally not the case.
However, in many large financial companies, a single data masking tool that is used across the entire enterprise isn't generally feasible. This is because each line of business may be required to implement their own data masking due to geographic disparity, budget/business requirements, different IT administration groups, or differing security/regulatory requirements. While this doesn't impact the general masking activities, this can make "workflows" extremely difficult if the various data masking tools are not somehow synched up. For instance, randomizing birth date may be perfectly acceptable for one line of business application. But in another line of business application the masked birth date must fall within a predefined range, like over 21 years of age, for the application to consider the record valid.
Secure the data masking algorithms
Finally, when rolling out data masking the last decision to be made is how to protect the seed values, or algorithms, used by the data masking tool. With the basic tenant of only allowing authorized users to have access to authorized information, these values should be considered extremely sensitive. If someone learns which repeatable masking algorithms are being used, he or she can reverse engineer large blocks of sensitive information. A data masking best practice is to employ separation of duties by allowing IT security personnel to determine what methods and algorithms will be used and granting them access to the data masking tool only at initial deployment to set up the values, but not afterwards.
With IT security not having access to day-to-day operations, and IT support personnel having no access to the algorithms, a strong "separation of duties" control can be put in place. But if the data masking tool doesn't provide this administrative separation, then the IT support personnel must go through periodic background checks and system access must be closely audited to ensure the algorithms aren't exposed.
Plan for the future
Data masking does have great advantages. If desired, enterprise applications themselves can be rewritten to do masking without a separate tool since their main function is usually some form of data manipulation. Masked information is readable, and if the right masking function was used, even valid production business workflows can be tested using "production-like" information. Customer service applications like help desk also don't have to have their presentation-level functions blank out portions of the screen to protect information since the masked data can inherently blank out sensitive information on behalf of the application. Even printing can be done without worrying who's standing at the printer if customer information is masked before it's printed. But implementing data masking isn't as easy as adding a module to an existing application or rolling out a system dedicated to the task. As with any data protection mechanism, planning, architecture and a vision of how the business will operate in the future are needed before the first piece of information is masked.
About the author:
Randall Gamby is an enterprise security architect for a Fortune 500 insurance and finance company who has worked in the security industry for more than 20 years. He specializes in security/identity management strategies, methodologies and architectures.
This was first published in August 2010