October 12, 2020

2312 words 11 mins read

Modernizing a tax-season fixture - Intuit migrates TurboTax to Kubernetes and lives to tell the tale

Modernizing a tax-season fixture - Intuit migrates TurboTax to Kubernetes and lives to tell the tale

Modernizing a tax-season fixture - Intuit migrates TurboTax to Kubernetes and lives to tell the tale Kurt Marko Wed, 11/25/2020 - 08:29

Summary:
Intuit’s experience migrating TurboTax and related services to Kubernetes in time for the 2020 tax season makes for an excellent use study in cloud infrastructure engineering.

(Pixabay )

There are few businesses more seasonal than tax preparation. Whether it is CPAs pouring through their client’s records, DIY-ers using online tax software or government agencies processing millions of returns, most of the year’s activity is compressed into a few weeks between late-January and mid-April. Note the word “online” here, since as Intuit’s chart summarizing TurboTax sales illustrates, the process of using software to automate and streamline the preparation of individual tax returns has become almost exclusively an online service. Such highly fluctuating workloads are an ideal use of cloud infrastructure, something Intuit realized early on. According to Pratik Wadher, Intuit’s VP of Product Development, having completed its migration to AWS, the company closed its last data center earlier this year. Although cloud infrastructure is more scalable than a fleet of on-premises servers, it still requires planning and effort to ensure that enough resources are available when needed. Indeed, Wadher says that even AWS EC2 auto-scaling isn’t fast enough to keep up with load spikes during peak hours of TurboTax usage. For Intuit, a better application deployment option pairs cloud infrastructure with Kubernetes container clusters. 2020 marked the first tax season when the majority of the modules and services making up TurboTax ran on Intuit’s Kubernetes platform. The company’s route to Kubernetes was mostly smooth and offers instructive lessons for other large organizations considering a similar strategy.

(Source: Intuit Investor Day 2020 presentation.)

Moving from VMs to containers - easier with a services-oriented application design Intuit is a diverse software company with a mix of consumer, professional, online and desktop products, however, it is best known for the TurboTax franchise which delivers about 41% of its total revenue and 54% of online sales. Indeed, with 48 million users this year, TurboTax accounted for 30% of all US tax returns. With such a sizeable user base, pioneering using Kubernetes infrastructure for TurboTax was simultaneously the most logical and riskiest choice for Intuit: logical since it promised the most financial and operational upside if successful; risky because any hiccups would cause the most monetary and reputational damage.

(Source: Intuit Investor Day 2020 presentation.)

A recent blog by an Intuit engineering team describes its experience migrating to Kubernetes citing five objectives when it started migrating from EC2 VMs to container clusters in 2019:

Increase the pace of product development and release. Consolidate development teams on a single infrastructure platform while preserving environmental isolation for each product group. Increase resource utilization, particularly during peak times to lower costs without compromising performance or reliability. Provide a unified distribution mechanism for reusable services and software components across products. Exploit a platform with a robust open source developer ecosystem contributing new features while offering Intuit developers an opportunity to participate in container- and DevOps-related open source projects.

Unlike many Kubernetes projects, Intuit jumped into the technology with both feet with a complicated product requiring sizable compute resources. According to Intuit’s engineering team, TurboTax and its dependencies consist of 400 micro-services with 40 planned for Kubernetes. Handling the 40-50 million customers requires 26 Kubernetes clusters distributed between two AWS regions, each using three Availability Zones (AZs). Overall, TurboTax uses about 1,000 Kubernetes nodes that must scale from 5K to 300K transactions per second (TPS) within two hours. Spreading clusters and the Kubernetes control plane among AZs is the standard way to provide redundancy within an AWS region. Although Intuit doesn’t offer the details, the AWS EKS managed Kubernetes service automatically provisions control nodes in multiple AZs and handles rerouting around failed nodes. AWS also documents three ways of creating multi-zone worker node clusters for EKS using auto scaling groups and a load balancer (ELB, ALB) . Intuit does note that it uses the AWS application load balancer (ALB) to distribute client requests to nodes within each cluster. For DR, Wadher says Intuit uses a mix of active-active and active-passive designs for regional redundancy where services can scale up capacity in either region. The goal is at least four 9s availability (about 4 minutes of downtime per month) for all micro-services. As cited in TurboTax moves to Kubernetes: An Intuit journey — Part 1:  Pre-production clusters are sliced by namespaces for QA (typically used for unit tests, build pipelines, etc.) and E2E (typically used for end-to-end product tests). Production clusters are sliced by namespaces for staging and production.

(Source: TurboTax moves to Kubernetes: An Intuit journey — Part 1)

One factor that significantly simplified Intuit’s Kubernetes migration was an application design that was already decomposed into stateless, API-driven microservices. The downside is that it required integrating the stateless worker nodes with various backend data services. Fortunately, these were already deployed in a separate AWS account and available via APIs. As Intuit’s diagram illustrates, connecting the two required peering the two VPCs and enabling cross-account access via AWS IAM, which was already used for account authentication. Again from TurboTax moves to Kubernetes: An Intuit journey — Part 1 Much of the data layer was already being accessed via API through NAT Gateway, though a few services did have additional resource dependencies on other AWS services for datastores, memory queues, and cache. Rather than migrate these AWS services to the AWS account that housed the Kubernetes cluster, we enabled cross-AWS account access via IAM access controls, and set up VPC peering where necessary.

(Source: TurboTax moves to Kubernetes: An Intuit journey — Part 1)

Production deployment reveals five solvable problems During three months of pre-production testing using several DR scenarios (including some that took out an entire AWS region), Intuit noted five significant problems with system scalability and performance. It documented these in a second engineering blog post detailing the technical issues and its solution to each. These primarily stemmed from default configuration parameters being too restrictive for an application the size of TurboTax. Specifically, slowdowns were caused by:

Kubernetes' DNS service was unable to service requests fast enough under heavy load; solved by increasing the cache size and number of control nodes running the kube-dns service. AWS Auto Scaling service couldn’t provision nodes fast enough; solved by adding a few spare nodes to each cluster during peak tax season. Nodes weren’t correctly booting up due to external dependencies; solved by creating a custom system image (AMI) for cluster nodes. Issues with the Kubernetes interface to AWS IAM during some operations; solved with a configuration change. Network storms when the AWS ALB performed a health check on cluster nodes; solved via several configuration changes. Lost or delayed entries when aggregating log data; solved by running a log collection agent in each Kubernetes pod as a sidecar function.

Satisfied with the fixes, Intuit deployed and successfully operated the production Kubernetes clusters in time for tax season in January. As this chart illustrates, from a baseline of about 1,100 pods, Intuit’s infrastructure scaled up to 2,500 in early April. Similarly, its baseline node count of 600 grew 50% to 900 by the end of tax season.

(Source: TurboTax moves to Kubernetes: An Intuit journey — Part 2)

My take Intuit demonstrated the feasibility and benefits of teaching an old software dog new tricks by repurposing its hallmark application on cloud-based Kubernetes infrastructure. Wadher said that although Intuit didn’t realize cost benefits in year one, the instrumentation and dashboards it built to monitor resource usage and spending have made managing its costs much easier and more accurate. Furthermore, integrating the Kubernetes infrastructure with its Argo-based CI/CD workflow, has significantly shortened the release cycle for new software from 1-3 months down to 3 weeks. Indeed, Wadher is confident that its Kubernetes-based infrastructure and process will enable Intuit to achieve daily software releases within three years. Intuit is typical of companies that have evolved from selling installed software to managed services. Critical to this evolution is exploiting new design paradigms like micro-services, automation tools like Argo and Jenkins, software packaging and management technology like Kubernetes and the abundant features and capacity available from cloud operators like AWS. As Intuit migrates the rest of its portfolio to containers, keeping up with the exponential growth in online revenue for something like Quickbooks wouldn’t be possible without the expertise gained during Intuit’s initial Kubernetes experiment during the 2020 tax season.  

Image credit - Pixabay

Read more on:
Financials Regulation Cloud platforms - infrastructure and architecture Fintech

Author: Kurt Marko

Date: 2020-11-25

URL: https://diginomica.com/modernizing-tax-season-fixture-intuit-migrates-turbotax-kubernetes-and-lives-tell-tale

diginomica.com

Huawei, 5G and the Eastern ‘new normal’ that is already underway (2020-11-24) Huawei 5G and the Eastern new normal that is already underway Martin Banks Tue 11/24/2020 - 07:29 Summary: While in the West the post-pandemic new normal is still a myth and mystery in the East it is already on the foothills of reality and the RCEP trade deal can only make that reality bigger and more real Pixabay While the words Huawei and 5G have been levered apart in the UK USA and some other E.. Huawei, 5G and the Eastern ‘new normal’ that is already underway
Enterprise hits and misses - AI ethics gets a fresh critique, and retailers get a pre-holiday reckoning (2020-11-23) Enterprise hits and misses - AI ethics gets a fresh critique and retailers get a pre-holiday reckoning Jon Reed Sun 11/22/2020 - 21:47 Summary: This week - AI ethics gets a fresh critique - and the gap between lofty AI talk and project needs is exposed Also: as we push into holiday season retailers get one more omni-grade Enterprise buyers share their COVID-19 era agendas and the whiffs keep comin.. Enterprise hits and misses - AI ethics gets a fresh critique, and retailers get a pre-holiday reckoning
Two surveys among SAP UK customers indicate trouble ahead as skill shortage bites (2020-11-25) Two surveys among SAP UK customers indicate trouble ahead as skill shortage bites Den Howlett Tue 11/24/2020 - 18:27 Summary: Two surveys provide insight into how an already difficult environment is made more challenging via SAP Last week the UK & Ireland SAP User Group issued the results of a recent survey that makes for tough reading The key findings are: The research reveals that cost and avail.. Two surveys among SAP UK customers indicate trouble ahead as skill shortage bites
Can a piece of drywall be smart? Bringing machine learning to everyday objects with TinyML (2020-11-10) Can a piece of drywall be smart? Bringing machine learning to everyday objects with TinyML Kurt Marko Mon 11/09/2020 - 23:23 Summary: So-called smart devices like Amazon Echo and Google Nest made early headway into our homes But will devices as small as a vibration sensor soon outsmart an Echo? Heres a look under the hood of TinyML Since the HAL9000 and Star Treks M-5 Multitronic the power and cap.. Can a piece of drywall be smart? Bringing machine learning to everyday objects with TinyML
Josh Greenbaum on enterprise software pricing, commodity software, productivity and value add (2020-11-05) Josh Greenbaum on enterprise software pricing commodity software productivity and value add Den Howlett Thu 11/05/2020 - 08:57 Summary: An engaging conversation about the controversial topic of pricing for enterprise software A short while ago Josh Greenbaum and I had a lively conversation on the topic of enterprise software pricing The full audio is above Id asked Josh to speak with me because it..
Bentley Motors drives towards digitally-enabled sustainability with Salesforce (2020-12-03) Bentley Motors drives towards digitally-enabled sustainability with Salesforce Stuart Lauchlan Wed 12/02/2020 - 16:05 Summary: As it motors into its second century in business Bentley is transforming into an exemplar of sustainability in action as it ups its digital capabilities Bentley Motors Bentley Motors is a UK luxury car firm with a reputation built up over a century for excellence and quali..
Communications apps are exploding. Who’s driving this? Check out Nylas API (2020-11-17) Communications apps are exploding Whos driving this? Check out Nylas API Jerry Bowles Mon 11/16/2020 - 19:49 Summary: Nylas says it saves developers over 1524240 hours per year by integrating just once with its RESTful API Image by Alexas_Fotos from Pixabay Application programming interfaces APIs are the connective tissue of enterprise communicationsAPIs allow different pieces of software to talk .. Communications apps are exploding. Who’s driving this? Check out Nylas API
Building digital workflows for the long game (2020-12-02) Building digital workflows for the long game Chris Pope Wed 12/02/2020 - 08:34 Summary: Chris Pope VP Innovation at ServiceNow explains how COVID-19 has left many companies thinking about what they need to do to permanently embed agility resilience and better experiences Image by jeonghwaryu0 from Pixabay Nobody from London to New York to the outer wilds of New Guinea needs to be told that 2020 wa.. Building digital workflows for the long game
Ad hoc interactions in the age of virtual work - an unsolved problem (2020-11-18) Ad hoc interactions in the age of virtual work - an unsolved problem Kurt Marko Wed 11/18/2020 - 02:27 Summary: Research announced by an MIT team a few months ago into enabling ad hoc video meetings reveals that there are limited options ahead Pixabay The past nine months of isolation-induced Zoom-osis has seen a Cambrian Explosion of social and professional mores regarding remote video meetings S..
Modern cloud solutions provide the direct path to agility (2020-11-20) Modern cloud solutions provide the direct path to agility Simon Quinton Fri 11/20/2020 - 05:35 Summary: Simon Quinton Infor VP & GM of the UK & Ireland discusses how cloud solutions have accelerated agility and helped organizations across a variety of industries survive and thrive through the COVID-19 pandemic - while positioning them well for future success Image by Gerd Altmann from Pixabay Com.. Modern cloud solutions provide the direct path to agility