AI News, BOOK REVIEW: Enhanced AML Transaction Monitoring With the Lakehouse Platform ... artificial intelligence

Use Amazon ECS Fargate Spot with CircleCI to deploy and manage applications in a cost-effective way

Post Syndicated from Chayan Panda original Data scientists continuously look for ways to accelerate time to value for analytics projects.

In this previous blog, we provided a solution architecture to run Data Science use cases for medium to large enterprises across industry verticals.

In this post, we describe and deliver the infrastructure code to run a secure, scalable and highly available RStudio and Shiny Server installation on AWS.

The goal is to build a shiny application to surface breast cancer prediction insights against a set of parameters to users.

Public URL Domain and Data Feed The RStudio/Shiny deployment accounts obtain the networking information for the publicly resolvable domain from a central networking account.

Users upload data to the S3 buckets in the central data account or configure an automated service like AWS Transfer Family to programmatically upload files.

This means if Amazon ECS restarts the container in another AZ, following a failover, the files and data for the container will not be lost as these are stored on Amazon EFS.

These include a RStudio front-end password, public key for bastion containers, and central data account access keys in AWS Secrets Manager.

– You can also create one RStudio container for each Data Scientist depending on your compute requirements by setting the cdk.json parameter individual_containers to true.

If your compute requirements exceed Fargate container compute limits, consider using EC2 launch type of Amazon ECS which offers a range of Amazon EC2 servers to fit your compute requirement.

Shiny containers are horizontally scalable and the pipeline creates the Shiny containers in the private subnet using Fargate launch type of Amazon ECS.

Users upload files to be analysed to a central data lake account either with manual S3 upload or programmatically using AWS Transfer for SFTP.

It is expected that bulk of the data transfer will happen on the hourly schedule and on-demand trigger will only be used when necessary.

Ensure you have a Docker hub login account, otherwise you might get an error while pulling the container images from Docker Hub with the pipeline – You have reached your pull rate limit.

Review the readmes delivered with the code and ensure you understand how the parameters in cdk.json control the deployment and how to prepare your environment to deploy the cdk stacks via the pipeline detailed below.

Central Network account – the Route53 base public domain will be hosted in this account Rstudio instance account – You can use as many of these accounts as required, this account will deploy RStudio and Shiny containers for an instance (dev, test, uat, prod) along with a bastion container and associated

Central Data account – this is the account to be used for deploying the data lake resources – such as S3 bucket for selecting up ingested source files.

Build the Docker container images in Amazon ECR in the central development account by running the image build pipeline as instructed in the readme.

Using the AWS console, create an AWS CodeCommit repository to hold the source code for building the images – for example, rstudio_docker_images.

rstudio_docker_images and pass the repository name to the name parameter in cdk.json for the image build pipeline e.

Pass the account numbers (comma separated) where rstudio instances will be deployed in the cdk.json paramter rstudio_account_ids f.

Monitor the pipeline (pipeline name is the name you provided in the name parameter in cdk.json) and confirm the docker images build successfully.

This will check whether all user emails have been registed with Amazon SES for all the rstudio deployment accounts in the region, before you can deploy rstudio/shiny.

We can demonstrate a typical data science use case: This showcases how to publish a Shiny application from RStudio containers to Shiny containers via a common EFS filesystem.

If the results are as expected, move on to developing a dashboard and publishing the model for business users to consume the machine learning insights.

This allows users to select data points on the chart and understand get the machine learning model inference as needed.

You can slide over the Probability Threshold to test how it changes the total count in the prediction, change the variables for the scatter plot and select data points to test the individual predictions.

In this blog, we demonstrated how a serverless architecture can be deployed, walked through a data science use case in RStudio server and deployed an interactive dashboard in Shiny server.

The solution creates a scalable, secure, and serverless data science environment for the R community that accelerates the data science process.