Data has become the engine that drives modern business, and collating and analysing that data is a crucial component of many IT departments’ duties. Most turn to Enterprise Data Warehouse (EDW) technologies, which offer platforms that allow business to centralise their data for easier analysis and processing.
Teradata is among the most well-known EDW platforms on the market, having spent the last 40 years building its reputation providing on-premise EDW hardware and software for customers including General Motors, P&G, eBay and Boeing. It has now transitioned to a cloud-first model and is now available on all three major public cloud providers, following the addition of Google Cloud Platform support on 22 October 2019.
Back in 2017, however, the company’s cloud credentials were not so well-established. That’s why when healthcare and pharmaceuticals giant Johnson & Johnson (J&J) decided to move its data stores to a Teradata-powered cloud infrastructure, the plan was met with surprise and skepticism. In the years leading up to the project, J&J’s senior manager for data and analytics Irfan Siddiqui says, the company became aware its current on-premise platform would not support its burgeoning data analytics requirements demands at an affordable price for very much longer.
“We [had] been experiencing some challenges and thinking about how we transform the traditional data warehouse into a more modern service, particularly around the flexibility, scalability and cost, and we were searching for a solution,” he told a Teradata conference in Denver, Colorado earlier this year.
And so, in 2017 it started to look at migrating its enterprise data warehouse (EDW) system to the cloud, eventually landing on Teradata as the most promising solution provider for its problems.
At that time, the offer of Teradata on AWS was not widely considered mature enough for an enterprise environment, Siddiqui tells Cloud Pro.
Five lessons from Johnson & Johnson’s EDW cloud migration
Identify all the stakeholders involved and begin discussions to identify potential challenges
Start with a small proof of concept to test all aspects of the potential solution
Understand as early as possible the network bandwidth and latency between your on-premise and cloud solutions
Expect some things to go wrong the first time you try them
Engage a strong project manager, who is good with timelines and risk, to be the single point of contact for communicating progress
Practise processes over and over again, including failure scenarios
“When Teradata released its first machine on AWS, and I said I wanted to do a proof of concept for Teradata in the cloud, people who knew Teradata, their first reaction was, ‘What? Why? Really?’.”
However, the commitment from Teradata to show its systems could work in the cloud was so strong Siddiqui found the confidence to go into a proof of concept. Initial trials showed promise.
The 80-terabyte a-ha moment
“Most of us know doing a capacity expansion or migration to new hardware takes in the order of six months but [with AWS] we were able to spin up a formal system with 80TB of data in just 20 minutes. That was one of the ‘a-ha moments’ for us which became the driving force for us to take another step,” he says.
J&J set itself five goals in lifting Teradata to the cloud, Siddiqui says: to migrate three data environments and all its applications by the halfway point of 2019; to offer the same or improved performance compared with the on-premise system; and to increase flexibility and scalability while reducing cost.
This posed a sizeable challenge for Siddiqui’s team, which aimed to support about 300TB of storage, 50 business applications and 2,500 analytics users on to a system capable of handling more than 200 million queries per month.
It also raised some significant questions.
“How are our applications going to perform? How do we migrate? What happens with downtime, and stability and security?” he says. “We had to address these questions, not just for our leadership team, but all the stakeholders across J&J. We had to show how it would benefit each one of us.”
Most applications stay on-prem
Although all the data warehouse workloads would be in the cloud, most of the related analytics applications and data visualisation tools, including Qlik, Talend, Informatica, and Tibco, remained on-premise.
Some applications were split between the cloud and on-premise servers. For example, J&J wanted to spin up application development environments in the cloud when they were required and only pay when using them. “That is the flexibility we did not have our own servers,” Siddiqui says.
Given the migration had to follow an upgrade to the data warehouse production environment, deadlines became tight. The team worked for three months more or less continuously. But by the end of June of 2019, it was able to decommission the on-premise data warehouse hardware systems.
The hard work has paid off for Siddiqui and his team. Extract-transform-load jobs now take half the time compared to the on-premise system. Large Tableau workload performance has improved by 60% and another application’s data loading was cut from more than three hours to 50 minutes.
Beware the desktop data hoarders
Claudia Imhoff, industry analyst and president of Intelligence Solutions, says it makes sense to put enterprise data warehousing in the cloud in terms of scalability and performance, but there are caveats.
“It’s a wonderful place if you have all the data in there. But, unless you’re a greenfield company, nobody has all of their data in the cloud. Even if most operational systems are in the cloud, there are so many little spreadsheets that are worth gold to the company, and they’re on somebody’s desktop,” she says.
“There are arguments for bringing the data into the cloud. It is this amorphous thing, and you don’t even know where the data is being stored. And you don’t care, as long as you get access to it. Some of it’s in Azure, some of it’s in AWS, and some of it is in fill-in-the-blank cloud. And, by the way, some of it is still on-premise. Can you bring the data together virtually and analyse it? Good luck with that,” she adds.
To succeed in getting data warehousing and analytics into the cloud, IT must convince those hoarding data on desktop systems that it is in their interest to share their data. The cloud has to do something for them, she says.
Despite the challenges, enterprise IT managers can expect to see more data warehouse deployments in the cloud. In April, IDC found the market for analytics tools and EDW software hosted on the public cloud would grow by 32% annually to represent more than 44% of the total market in 2022. These organisations will have plenty to learn from J&J’s data warehouse journey.