Brandon Harris bio photo

Brandon Harris

Cloud + Data Engineering + Analytics

Twitter LinkedIn Instagram Github Photography

This is a series of articles in 5 parts. Each part investigates a specific challenge organizations usually run into, and the final part wrap-ups up with some of my additional thoughts and suggestions. You can navigate the series using the links in the list below.

Challenge #5 – Lift-and-Shift vs Refactoring
Challenge #4 – Why no one wants to use your new tools / platform!
Challenge #3 - Solving for Compliance
Challenge #2 – Organizational Rigidity
Challenge #1 – A Lack of Analytics Support (investment and success are correlated).

Challenge #3 – Solving for Compliance: Data Architecture and Governance Considerations

In the current landscape of data-driven, digital business operations, ensuring compliance has become a priority for organizations across industries. As data privacy concerns continue to emerge and standards such as SOC, HIPAA, PCI, and SOX continue to evolve, the need for a robust data management practice has never been more important. In this blog post we’re going explore data architecture considerations as it relates to supporting compliance, as well as a few recommendations for data governance strategies. I’ll also touch upon how choosing a particular cloud data platform can directly influence how easily an organization can support compliance initiatives.

What do we mean by compliance? In general, compliance refers to the ability of an organization to conform to guidelines, laws, and standards. Achieving compliance is typically manifested by policies and controls around how an organization manages and protects its data. When an organization becomes non-compliant, this can result in lawsuits, heavy financial penalties as well as creating significant levels of reputational risk.

Data Architecture Considerations

A successful compliance practice starts with a well-designed data architecture, enabling the organization to manage and protect its data effectively. Here are a few important considerations:

  1. Data Segregation and Classification: Data should be classified based on its sensitivity and the level of protection required, such as public, internal, confidential, and restricted. A classification system assists in controlling access to data and ensuring it is handled appropriately. One pitfall here.. It may be tempting to include the level of classification in your naming standards such as naming a database view “employee_address_confidential”, however this paints a very large target on your data for bad actors. You don’t want to go out of your way to direct bad guys to your most valuable information.

  2. Built-in Data Security: Security should be an inherent part of your data architecture. Encryption of data at rest and in transit, robust access control mechanisms, intrusion detection systems, and regular vulnerability assessments are some elements that you’ll want to spend some time thinking through. Another tip here: When it comes to object storage and encryption, make sure you understand and have thought through the implications of using keys controlled by a cloud provider vs your own KMS implementation. Who do you want to ultimately own the keys to your data, and are you comfortable if it’s controlled by a 3rd party?

  3. Auditability: The ability to audit who accessed what data, when, and from where, is a crucial compliance requirement. Implementing comprehensive logging and monitoring mechanisms that record data access and modifications should be on your radar early on.

  4. Data Residency: Many countries are starting to enact requirements on how and where data is stored and processed. How will your architecture handle data that needs to be used together, but that has to reside in different physical locations?

Data Governance Approaches

A comprehensive data governance strategy is a foundation for ensuring compliance. If you’ve ever tried to convince a CEO to invest in data governance, however, you know it can be a difficult sell. There’s nothing inherently sexy about data governance (apologies to my data governance friends!). However it’s well worth the effort, as the payoff from a sound data governance practice ensures that data is managed as a strategic asset and allows the technology organizations to cohesively deliver solutions for data quality, metadata management, and data privacy. Data Governance is a very big topic, but here are a few considerations if you’re embarking on this journey or looking to reboot a stalled effort.

  1. Designate Data Stewards: Data stewards ensure data is used and maintained according to defined policies and procedures, contributing to a culture of data compliance. The best data stewards, in my opinion, are from the business teams as they have the best understanding of how the data is used and how it drives decision making. Bribe them with free lunches if you must (from experience, a Chipotle catered taco bar is usually a winner).

  2. Institute a Data Governance Council: This council would be responsible for establishing policies and procedures around data management and ensuring they align with compliance requirements. Again, do not do this in a technology vacuum, but ensure buy-in and collaboration from technical and non-technical organizations alike.

  3. Establish Data Policies and Procedures: Data policies and procedures should clearly articulate how data should be collected, stored, used, and disposed of. This includes defining roles and responsibilities around data management. As a leader in a larger company, and especially public companies, this should be foundational. If you don’t have this, do this now. Imagine an unfortunate data security scenario on your watch… you really don’t want to be in a position to hear someone say “you mean we don’t have any documented standards on how we collect and store data?”.

Leverage Cloud Data Platforms

We’ve come a long way since the days of DB/2 and green-screen SQL. Today’s Cloud data platforms provide a fantastic array of native support for compliance requirements, simplifying the journey to compliance. A few of the platforms I’ve used in past roles have some great capabilities that I’ll mention below.

  1. Snowflake: Probably my favorite cloud data platform for a number of a reasons! With built-in object tagging and dynamic data masking, Snowflake ensures sensitive data is classified and protected effectively. Object tagging allows users to assign descriptive tags to data objects, aiding in the classification and tracking of data. Dynamic data masking limits sensitive data exposure by masking it for users without the necessary privileges.
  2. AWS Redshift: AWS Redshift comes with AWS CloudTrail, which provides logging for auditing purposes. This facilitates tracking of user activity and data usage, creating an auditable record of all actions.
  3. Google BigQuery: BigQuery has a number of auditing features that support compliance efforts such as Query Monitoring, policy tags, and Cloud DLP (data loss prevention).

Having ridden the compliance rodeo a few times myself, I know the road might seem daunting. Investing in the right data architecture and data governance strategy alongside the right cloud data platform can significantly ease the journey. The key lies in a systematic approach that combines proactive and iterative planning, effective technology utilization, and a commitment to continuously evolving the body of work. In our increasingly data-driven world, compliance is not just a defensive maneuver; it’s a cornerstone of building trust with our customers and employees.