Brandon Harris bio photo

Brandon Harris

Cloud + Data Engineering + Analytics

Twitter LinkedIn Instagram Github Photography

This is a series of articles in 5 parts. Each part investigates a specific challenge organizations usually run into, and the final part wrap-ups up with some of my additional thoughts and suggestions. You can navigate the series using the links in the list below.

Introduction

I’ve been at the intersection of technology and data for over 20 years. I’ve spent half of that, a solid decade, of my career helping to build, transform and lead analytics organizations. One of the things that always amazes me is how consistent the challenges that organizations face are. The same challenges show up again and again, regardless of the size of the company or the industry they’re in. You might find this surprising (and perhaps comforting, if you’re coming here for answers), but even the largest, most well-funded and well-run companies stumble on some of these challenges. I’ve seen these challenges eat away at organizational trust, derail projects and stifle transformation efforts across Fortune 10 juggernauts, Series A startups and everything in-between.

I think of these challenges like we think of a common cold. No one is truly immune, but there are steps that can be taken to minimize the impact and how quickly one recovers and moves on. The speed and agility at which an organization can move through and beyond these challenges can be a powerful differentiator. If your competitor is stuck for 2 years building a cloud data lake and you can get it done in 6 months, who do you think is going to get to those important customer insights and trends first?

Regardless of the challenge for your organization, I believe there’s a common theme that underlies a lot of these items. I’ll talk more about that towards the end of the series, but for now, let’s examine our Top 5 list of analytics challenges that most organizations will face. I’ll also include some suggestions based on my first-hand experience working through them, in some cases multiple times across different organizations.

Challenge #5 – Lift-and-Shift vs Refactoring
Challenge #4 – Why no one wants to use your new tools / platform!
Challenge #3 - Solving for Compliance
Challenge #2 – Organizational Rigidity
Challenge #1 – A Lack of Analytics Support (investment and success are correlated).

Challenge #5 – Lift-and-Shift vs Refactoring

You’ve got the executive support and funding to move the organization’s data to a new “next-gen” platform, probably with a cloud provider. Do you invest in refactoring your data, which is slow, but better long-term value? Do you take the fast route and lift-and-shift your existing databases / warehouses to the new platform at the expense of migrating your same data limitations and challenges? You make a call, and then we fast forward 6 months. You’ve managed to get a PB+ of data on the cloud which is a huge achievement, and yet no one is happy.

Common Pain Points

  • Difficulty providing basic dashboards that include a complete view of business operations (or if you are doing it, you’re using a “cheater” data set provide that end-to-end view).
  • Can’t easily join or match data across domains.
  • You have 90% of the data you need, but all of your business teams tell you they can’t do anything with it. You always seem to be missing that critical 10%.
  • Lack of adoption for your new set of cloud tools or data lake. People won’t give up their Crystal / Qlik / BusinessObjects / SAS / Cognos / MicroStrategy / BIRT / Spotfire.

The Way Forward

No matter which starting point one chooses (lift-and-shift vs refactoring), invariably about 6-12 months into this they recognize the pitfalls and then try and quicky make-up for it by trying to accelerate down the other path. So what’s the answer? Which path do you start with? The answer is deceptively simple.

​ Both.

The most effective path forward for this challenge is to always start by taking both paths simultaneously, but with one important trick I’ll share in a minute that will bring it all together.

First though, spend some quality time with your DBA’s and identify which warehouses / tables / datasets are the most frequently accessed, as well their most commonly joined tables / datasets. Dig into your reporting environments and see which reports are used the most and figure out what their underlying data sources are. This investigation should define your lift-and-shift population. Typically, this will include operational data, customer data, product or service data, as well as sales data.

Author’s Note - Make sure you have some guidance from a data governance / compliance expert depending on the scope and context of this data. You definitely don’t want to break or impact any existing privacy solutions, audit controls or compliance processes for SOX, SOC 1/2, HIPPA, PCI DSS, GDPR, CCPA, etc…

Next up, partner with your Data Architects and start building out optimized data models. If your company isn’t big enough to have a data architect, you can try your hand with any number of ERD tools (Erwin, Trevor.io, DBDiagram.io) but I’d strongly recommend having at least one conversation with an analytics-focused consultancy to get a feel for what you’re getting into. If your organization hasn’t ever been strong in Master Data Management, or never bothered with clearly defined data domains, now would be a great opportunity to invest some time and thought. If you have 20 different transactional systems that process customer orders, this is the perfect moment to think about how you unify this at a data model level. One important pitfall I want to call out here is to not “big bang” all of these models. Work iteratively, start small and shallow across your core domains (sales, customer, operations, etc…), then expand your full set of attributes and dimensions as you go.

Author’s Note - For a mature organization, I strongly suggest at least reviewing the Data Vault 2.0 design patterns. Data Vault 2.0 is a brilliant approach that merges the best of OLAP / OTAP models and provides an amazing foundation to build from. The trade-off is that it can be complex initially, requiring a lot of effort up-front to deliver, so it’s not for every organization.

Finally, and most importantly, let’s talk about the critical piece that allows all of this work well and will save you and your team a ton of time. Views. Yep, that’s it. Using your defined target-state for data models from your architectural discussions, create those views and point them to your lift-and-shift data.

The view approach is needed for two reasons.

1) It allows you to pressure-test your shiny-new target-state data model for any issues or scenarios you may not have envisioned. 2) The view allows you to gradually transition between lift-and-shift data and your new data model, and to deliver a seamless experience to your customers. They don’t have to re-do any queries / reports or other work they may have built on the lift-and-shift data. In fact, they shouldn’t even notice a change if this is done properly.

Another tip that can save time, when working with your DBA’s, identify the top 5-10 queries per organization or business unit. Translate those queries into your new data model / view and share them with those teams as you onboard them. This is their Rosetta Stone for their new data world. Having this will reduce their anxiety and cut down on the number of follow-up emails, calls and text messages you and your team will receive (don’t worry, you’ll still get plenty). It will also serve as a great initial testbed to ensure data accuracy / quality (did we get the same row count? Same summary totals?) as well as performance (why is the new query taking 10 seconds longer?).

My last suggestion on this topic, be prepared to iterate! I don’t care if you’re Ralph Kimball, you’re not going to nail the target data architecture the first time through. This is another reason why the view approach is important. Accept up front that development of your refactored data models and assets are going to take a while to get into their ideal state. After all, I would wager that your current state warehouse / models have accumulated and changed over years and years (or in some of your cases, decades and decades). It’s unrealistic to expect instant perfection in a new environment, and this is an important point to cover when setting expectations your leadership and your customers (you are having regular conversations with both… right..?!).

The pitfalls when moving to a new analytics or data platform can be avoided with enough foresight and planning. It can be a fine line to walk, but by providing short-term capabilities (lift-and-shift) and a long-term vision (target-state, analytics ready data models), and then abstracting all of that with views, you’ve set yourself up for success.

Don’t forget to have a way to measure progress (# of records / tables, bonus points if you can tie the datasets to business unit P&L contribution) and to report out consistently to your leadership and business stakeholders. Telling the story of your value delivery and the progress your teams are making can be just as important as making progress in the first place!