DEV Community: Dan Greene

THE ROAD TO AWS RE:INVENT 2018 – WEEKLY PREDICTIONS, PART 2: DATA 2.0

Dan Greene — Thu, 15 Nov 2018 22:35:06 +0000

Originally published here.

Last week I made the easy prediction that at re:Invent, AWS would announce more so-called ‘serverless’ capabilities. It’s no secret that they are all-in on moving from server management to service management. I guessed at a few specific possibilities – SFTP-as-a-Service, ‘serverless’ EC2, and a few others.

This week, I want to look at some of the other capabilities provided by AWS and make some predictions as to what announcements we might see. Why should any or all of this matter to you? If you’re in the business of processing, storing, and analyzing large sets of data, these updates may significantly impact the speed, efficiency, and cost at which you’re able to do so.

WEEK 2 PREDICTION: DATA 2.0

While AWS has a number of existing tools to manage data ingestion and processing (e.g. Data Pipeline, Glue, Kinesis), I think adding in an orchestration framework optimized for all the steps in a robust data processing framework would really allow for AWS’ data analytical tools (Athena, QuickSight, etc) to really shine.

DATA-MAPPING-AS-A-SERVICE

I cut my teeth with data integration on platforms like WebMethods. While it may have had some drawbacks, it was, as a solution set, really excellent at:

Providing endpoints for data delivery
Identification of data by location, format, or other specific data elements
Routing the data to the right processors based on the above features
Mapping of each data entry from one format to another
Delivery of transformed data into target location

I can see an equivalent of something akin to a managed Apache NiFi solution – in a manner like AWS’ ElasticSearch Service. Tying in the ability to route various tasks to be executed by Lambda and/or Fargate, supporting Directed Acyclic Graph (DAC) modeling, and a tight integration into writing out data to S3 as both final and intermediate steps would be a game-changer for products that have to import and process data files – particularly from third parties.

S3 LIFECYCLE ON READ TIME

One of my pet peeves on the S3 lifecycle management is that moving from Standard to Infrequent Access storage class has nothing to do with the frequency of accessing the file. While I would imagine that the underlying capabilities of an object store makes it very difficult to actually do this, it would provide a much-needed metric to make storage decisions.

DYNAMODB DEEP DOCUMENT MODE

DynamoDB is a great hybrid key and document store. I use it often for small document store and retrieval. However, the current limits on document size and scan patterns make using DynamoDB as a managed MongoDB-level solution is a challenge. Providing more robust document-centric capabilities, while still supporting the scalability, replication, and global presence would significantly “up the game” for DynamoDB. As a wish-list factor for DynamoDB I would like to completely remove the pre-allocation of throughput for reads and writes. Let each request set an optional throttle, but charge me for what I actually use rather than what I might use. The current autoscaling is a significant improvement over nothing – but it can be improved.

RDS – POLYGLOT EDITION

For a while there, there was an interesting trend to try to combine multiple database paradigms into a single view – combining document + graph, etc. I think that AWS may try to tip their toe into this view.By combining a few of their existing products together behind the scenes, it would be interesting to link ElasticSearch, Aurora, and Neptune together for a solution that tries to combine the best of each of the storage paradigms. Like most all-in-one tools, I’m honestly not sure if it will just do the multiple features equally mediocre. I often recommend a multi-storage solution for clients for their data – each one optimized for a particular use case, so there may be something there.

S3 AUTO-CRAWLING AND METRICS

Imagine setting a flag on a data bucket so whenever a data file drops there, it is automatically classified, indexed, and ready for Athena, Glue, or Hive querying. Having some high-level metrics on the data within would be useful for other business decisions – row count – average values, etc. Adding in some SageMaker algorithms for data variance (e.g. random cut forest for discovering data outliers and/or trends) to fire off alerts would be incredible, too.

WRAPPING IT UP

In closing this week, I think there will be a lot of different announcements around data processing as an AWS-centric framework. AWS has most of the parts in play already – having AWS manage the wiring up of them so you only have to focus on the business value you are extracting from the data would realize the promise of the cloud for data processing.

Going to be at re:Invent? Drop a comment below and let me know what you hope to see there or your thoughts on what’s next.

The Road to AWS re:Invent 2018 - Weekly Predictions

Dan Greene — Wed, 07 Nov 2018 20:15:37 +0000

Originally published here.

Every year in Las Vegas, AWS holds their biggest conference of the year. Tens of thousands descend upon the desert. I heard numbers last year of about 45,000 attendees. They go for the thousands of training sessions, the events and celebrations of success in the Amazon cloud. Learning, seeing, and more partying (for some) than you can shake a cloud at for a week every November.

AWS takes this opportunity to announce the majority of their new features and products at this conference during their two keynote addresses. I’m going to take a semi-educated guess for what might be coming up in a few short weeks. I may be right, wrong, or something in between, but it’s a fun exercise to think about how people are using the cloud, AWS in particular, and where I think they could do better. It’ll be interesting if they address that, by some accounts, Azure has caught up to them, revenue-wise.

Weekly Prediction: Can I haz more serverless?

Okay - so this is a pretty easy prediction - AWS has been moving more of their capabilities from ‘run a server to do it’ - to ‘call our service to do it’. There are a number of areas that I think are primed for replacement by a serverless approach.

SFTP service

Every solution I’ve helped build that processes files inevitably ends up needing to stand up a SFTP server. Typically, there are small instances that capture the file, then upload it to S3, where the rest of the AWS serverless ecosystem can take over. Replacing this with a simple service you enable on a VPC would simplify many infrastructures. Just have it route to S3 or to an EFS mount - and you make a LOT of people happy. Throw in FTP and/or FTPS, or any other file transport protocol to offer ‘file transfer as a service’ as a capability.

VPN service

The other servers that most ‘serverless’ solutions need to deploy is a VPN solution - whether via OpenVPN or via other marketplace offerings - securing your cloud resources, but allowing authorized access at the network level. This is critical to most product infrastructures. I think providing a service that you can enable on a VPC that provides a basic VPN solution (possibly even compatible with the OpenVPN client) would be a godsend for most product infrastructures.

Cognito as a Directory Service

There are a number of AWS solutions that need end users that are, in actuality, separate from IAM users. For me, IAM is a control system for AWS API calls - not for capabilities that are independent of those APIs. Allowing Cognito User Pools to control access to the two above solutions, act like an LDAP service for other software products. Along those lines, Cognito as an authentication system for EC2 instance authentication would be pretty amazing too. Centralize username/password as well as ssh access (by storing a user’s public key) could be much easier to manage than even the Simple Directory offering. Lastly, extending Cognito support for CodeCommit or a Docker image repository separates IAM from the these product’s workstreams, making them much less awkward to work with.

Fargate EFS volumes

I may have beat the horse to death with another dead horse on this request - I pretty much ask for it on any conversation with AWS regarding Fargate. The lack of persistent volume support limits the potential workloads that Fargate can work on. Providing the means of mounting an existing EFS volume to the Docker container would allow for a lot of flexibility and usage possibilities.

Lambda time limit increase (again)

I know that AWS just did the great change of tripling the Lambda timeout limit from 5 minutes to 15 minutes. The challenge is that they showed that they can increase the limit. I would like for them to increase this to 60 minutes - or unlimited for that matter. You’re paying for usage by-the-millisecond anyway. I do think putting a user-defined limit on it is good though to avoid denial-of-wallet attacks by recursion and/or infinite loop issues.

Serverless EC2

This one is a bit out there, to be sure, but hear me out - if AWS can run a database engine that autoscales the CPU and memory usage dynamically, it’s not out of the realm of possibility to do that on a ‘regular’ instance. On launch, you would set min and max levels of processing units and the thresholds to increase the number of units - again, very similar to the serverless Aurora model. I would expect to pay a premium for this - but for when you have workloads that are not under your control - customer data file processing, traffic analysis, etc. - this may be the perfect fit.

Next week…

We’ll look at some of the other offerings outside of the serverless space and make some additional predictions.

If you’re going to be at re:Invent and want to discuss - drop a comment below! 3Pillar is an AWS Advanced Consulting Partner, and we're always looking to learn more about how people are leveraging AWS to improve their product.

Stand by your Lambda - Overcoming AWS Lambda's Limitations

Dan Greene — Thu, 04 Oct 2018 12:50:49 +0000

This post originally appeared here

A short while ago, I pointed out some Lambda anti patterns. Following up on that post, I thought we’d also like to point out some tips and tricks in overcoming some of the limitations of AWS’ Function as a Service Lambda. Lambda as a service is, in all honesty, pretty awesome. Self-scaling, modular code execution is an incredibly useful tool. However, we do see the classic trend of everything seeming like a nail, and Lambda is the golden hammer. Lambda is designed as a discrete small event handler. When you start using it for other things (or even in normal use), you’ll start bumping up against its limitations. Lambda has a few critical limitations.

5 minute execution

This is the big one that gets most people into trouble. Fun hint - unless you have a reason not to - set processing time for all your functions for 5 minutes. You are only charged for actual usage, so rather than be surprised that something took a few seconds longer than you had configured, play it safe. The nature of Lambda is that it’s targeted for small, bite-sized processing tasks, and sometimes you can get into a situation where what seems to be a great use case for Lambda will bump up against this limit and cause you much heartburn. Here are a few approaches you can take to mitigate this limit.

Divide and conquer

A pretty common use case for Lambda is processing a file that drops on S3. If possible, instead of processing the whole file in a single Lambda invocation, I suggest splitting the work up by calling another Lambda to process a set number of rows that you are more than confident will complete under the limit. Basically, pass the S3 location, the start line number, the end line number and loop over until you’ve kicked off each set. Alternatively, streaming the records into another solution such as SNS or Kinesis may suffice. Be aware that if the size of the data payload is out of your control, you’ll want to still put in some kind of stopgap measure.

Cache some data and keep it warm

While this may cause other issues (see below) - caching some data in the /tmp partition provided to all function containers may provide a means to decrease execution time after the first call - you’ll have to balance whether you can build the cached data, write to /tmp, then still perform the necessary functionality - and you’ll pay this price on each cold start of your function, but on warm execution, it may save a LOT of processing time. To keep it warm, you can schedule a CloudWatch scheduled event to ‘ping’ your function - a no-op parameter that can either ensure the container stays alive, or starts it up so functional hits are pre-warmed. You’ll need to find the right balance here to ensure you’re not spinning up extra copies of your function unnecessarily, but it’s something that you can tune over time.

Call yourself as a lifeline

If in your use case, splitting up the processing of the data file doesn’t work - say it needs to be processed sequentially - then you can leverage the context object to effectively get a ‘time remaining’, and once approaching the limit, call the same Lambda asynchronously, providing a file offset. The subsequent invocations skip the previously processed lines, and continue on. As with any recursion - be wary not to put yourself in an infinite loop - you never want to be in a position to explain how you did a ‘Denial of Wallet’ attack on yourself.

Turn up the volume - of memory

For some cases, increasing the memory allocated may reduce processing time - watch the CloudWatch logs output for your function’s invocation memory usage. If peak usage is at or near the top of the allocation, upping the number may help. This is likely a stopgap measure, but an incredibly easy one to implement. As a bonus - if you see your max memory used is constantly well under the memory allocated - reducing it will lower your costs.

Change the game - or at least the language

Different languages have different strengths - however, building your Lambda in a compiled language - .Net Core, Go, and even Java may perform better for your particular use case than an interpreted language like NodeJS or Python. It might not be an option, but something to keep in mind.

Bring in the big guns

Let’s face it - there will be some scenarios and events that will just take more than 5 minutes to process. In this age of endless data, data files are getting larger, and you’re going to have to deal with it. As much as I’m a fan of Lambda - it’s not always going to cut it. There are two major escalation points that you can choose - the most straightforward is to move your processing from Lambda to a Fargate task - a serverless container execution can give all the benefits of Lambda without many of the limitations. That comes with more of a preparation cost - but done strategically, Fargate containers can dovetail very nicely into your existing serverless product architecture. The second approach would be to leverage EMR, Glue, or other service to do the heavy lifting, and just use Lambda as the triggering mechanism to ensure the processing flow is started.

Request payload

The next most likely item to get caught up on is Lambda’s payload limits - 6MB for synchronous execution, but only 128K for asynchronous calls. Truth be told, if you’re passing large payloads around an event framework - you’re doing it wrong :). You should be checking your payload size before calling a Lambda programmatically - because sometimes you’re not in control of your message size, you should also know some workarounds to this.

Divide and conquer (again)

Like the above advice for processing time, if possible, split your payload to be processed by separate invocations of your function - they autoscale automatically - so splitting and passing part of the payload at a time will allow you to not only avoid the payload limit, but will, as above, run faster in parallel.

Use some scratch space

The limit is only on the invocation payload - not the data processed, so you can send a S3 or database location instead of the data. I recently was doing some event data processing, and I was processing customer data, and reorganized the data into a map to allow efficient lookup. To save re-doing this map function in subsequent calls, I was attempting to pass it along in the payload to child Lambda calls. Well, as you can guess, the lookup map got too big over time and blew up the Lambda invocation. I ended up using DynamoDB as the scratch space as the required throughput was so low, it was negligible cost, and performed fantastic! Note that DynamoDB has an item limit of 400k - so keep in mind on how you use it. I could have also used ElastiCache, but I simply went with a resource I was already using in the application. Splitting the data and writing out to S3 is an even better way to go, as you can use the dropping of the file on S3 as the mechanism of triggering the subsequent Lambda. Think about your control mechanisms as events rather than flow, and these usage patterns will develop before your very eyes.

Networking

Okay - this isn’t a limitation per se, but there are some related limitations. Running Lambdas inside a VPC poses a few restrictions. First, each instance of your function will run inside a container - and that container will be issued an EIP on instantiation (which adds a significant increase to function cold start time as well) - and you may have limits on your account of EIPs. Secondarily to that, you can only run as many instances of your function as you have IP addresses available in your subnet. This is a fundamental issue due to you not typically being able to control how many instances of your function are running. For this - you should only run functions inside your VPC that need to run inside your VPC, and for those that do have to run inside - be sure to design them to minimize likelihood of massive concurrent execution. You can also now add a limit to the maximum number of concurrent invocations of your function. This will cause you to fall into an AWS retry scenario - and will, by nature, throttle your function. You may be trading one set of error messages for another, but it’s there as a lever you can use.

Memory and Disk Limits

Now disk limits may sound odd in the discussion on serverless technologies - but utilizing the /tmp drive space is a pretty common technique to cache data (as mentioned above), which may minimize execution time on non-cold starts. However, it’s limited to 512MB - so trying to cache too much will cause your container to fail. Use the space sparingly, but use it where it can help.

Memory limits are another factor - depending on your code, hitting the limit of function memory may cause slowness, or may even cause the code to crash. As you are being charged by GB seconds, you do not want to overprovision your function, particularly one called often, but you don’t want to hit that limit either. Do a periodic analysis of the CloudWatch logs output of your function. The final line of output lists the provisioned memory, and the peak memory used. Start high, then tune down - try to aim your peak memory usage at around the 80% mark, just in case you have some unexpected behavior, but you’ll need to take account to the volatility of your memory usage to find the right, but not oversized, mark.

Summary

So, in closing, there’s a lot of great things about Lambda, and how it fits into the serverless ecosystem (and yes, they are different), but knowing how to make the most of it is dependant upon knowing its strengths and its limitations. At 3Pillar Global, we are excited about the promise of serverless computing, in all the forms it takes, from Lambda, to Fargate, serverless databases, and beyond. If you really love serverless, then stand by through the limitations, because after all, it’s just a Lambda.

Silence of the Lambdas - 5 antipatterns for AWS Lambda

Dan Greene — Wed, 05 Sep 2018 16:59:44 +0000

This article originally appeared at: [https://www.3pillarglobal.com/insights/silence-lambdas-5-anti-patterns-aws-lambda]

It’s no secret that AWS is pushing their serverless offerings at every opportunity. Serverless containers, storage, NOSQL, and even relational databases are abstracting the running of product software away from the underlying infrastructure that they are running on. At the core of all of AWS’ serverless landscape is their Lambda product. It is Function-as-a-service (FaaS), meaning it executes code packages on various event-driven triggers, like HTTP calls, notification topics, S3 file drops, and even scheduled cron jobs.

Here at 3Pillar Global, we are using them to build serverless products that span computer vision, data processing, and all kinds of product development, both for our customers as well as internally. In using them, however, we have found a number of ‘gotchas’ that you should look out for as you adopt this new model of cloud computing. We’ve gathered them up for you here, and hope that these pointers help sidestep - or at least prepare for - challenges that you may experience with Lambda.

Building Lambdas Like Server-Full Code

When you’re writing code to run on traditional server software, you typically take advantage of server startup time, and when an end-user is involved, pre-load wherever you can to minimize code execution time. In Lambda, your server can effectively restart anytime, so you can - and will - pay and re-pay that startup cost. Focus on writing code that is streamlined and fast-to-answer. Do not load anything that isn’t needed until it’s needed.

This is especially relevant if you’re using Java as your language, as almost all practices are about loading classes at startup. This is exacerbated by how easy it is to inadvertently pull in massive dependency trees when using frameworks like Spring. I am a huge Spring fan, but it is not well-suited for Lambda container lifecycles. I also suggest being brutal in adding dependencies to your package/pom/gradle/nuget file. If you’re hitting the Lambda code limit (50 MB), you can analyze your dependency tree and possibly even put in some explicit excludes.

That said, you can leverage the times when the container for your code is reused by basically taking a singleton approach to expensive resources - use it if it’s there, but don’t assume it is there, initialize it if it’s not. There is also a ‘/tmp’ mount on all Lambda containers that you can use as a scratch area, but again, you cannot assume the same container will be used from invocation to invocation. As a ‘last’ resort, you may leverage CloudWatch Events to periodically ‘ping’ your function to keep it hot. I would not recommend this unless it provides significant benefit, and even then you cannot rely on it always working, so you would still need to handle hitting a ‘cold’ Lambda.

Ignoring Monitoring Services

Normally, you can investigate the server logs for issues. Having no servers doesn’t mean no logs, though - by default, function output gets routed to CloudWatch Logs. These logs are organized by ‘streams,' so finding the exact execution you’re looking for can be difficult. Make your life easier by utilizing a correlation ID across API/Lambda invocation. Additionally, provide unique text identifiers on each error to make finding the records much easier. For performance-related inquiries, the last line of Lambda logging includes total request time and memory used. Also, X-Ray provides insight into container startup and tracing capabilities. Leverage it to get into the details of operation.

While CloudWatch Logs will never run out of drive space like a server might, the default retention of log events is forever. To avoid an ever-increasing cost, you may want to change your retention policies to have the logs auto-expire in an acceptable time frame.

Doing it all Manually

AWS’ serverless options are quite widespread and getting broader by the week. When attempting to build a production solution backed by serverless technologies, you can very easily get overwhelmed trying to define all the parts and wiring them all together. Thankfully, many solutions have emerged in the market to simplify putting it all together. AWS has two related solutions - the Serverless Application Model (SAM), and a python-specific version called ‘Chalice.' The non-AWS solutions - Serverless.com, Apex, and Zappa - are similar in nature, although they do offer multi-cloud support, since serverless is not just an AWS thing.

In any case, be sure to leverage the ability to define secondary resources (e.g. IAM roles, S3 buckets, DynamoDB tables) that your services depend on. Given how easy it is to add these resources, it’s a great time to push ‘infrastructure as code’ if you haven’t already. Controlling your supporting resources by storing the needs in the source code repository, and eliminating manual deployment and configuration greatly stabilizes your product operations.

Failure to Establish Standards & Conventions

AWS Lambda is very open in terms of how you want to configure it - any function that matches the right signature can be defined as the handler for the function. You can name things however you want. If you just rush in, you will likely find yourself in a rat’s nest of code that is incredibly hard to maintain and troubleshoot. I suggest that you establish naming and environmental conventions early - e.g. always name your function handler the same as the function name and name the method ‘handler’ (or whatever pattern you want), just define one and enforce it.

Since not all AWS resources support the concepts of ‘environments,' be sure to use naming conventions on things like S3 Bucket names, DynamoDB table names, etc. And have the Lambda code be passed in the environment it’s running in as a means of mapping it all together.

Sit down and also decide logically where you draw the lines between services, functions, code repositories, etc. I would start coarse-grained and split things as the product/code gets more complex. It is far easier to split code than to merge code.

Lastly, one of the benefits of Lambda is its polyglot nature - you can code each function in a separate supported language if desired. I would highly recommend keeping your product to as few languages as possible, but do be open to the option of leveraging other languages if there is a library or capability needed (Java and Python come to mind here). Keep these as the ‘exception’ rather than the rule to reduce cognitive overhead.

Don't stop regular best practices just because it's serverless

There are many practices that people have a habit of dropping just because of the different nature of deploying serverless code. That, combined with the nature of getting started with a new technologies, causes many efforts to skip some incredibly important coding practices. Just because your code is no longer running on explicit hardware doesn’t absolve you from bugs. You should still use the same level of rigor in your source control, still perform code reviews, and still perform static analysis of your code.

In fact, AWS provides many code release capabilities that, themselves, are serverless, including CodeCommit for git repository, CodeBuild for CI build, CodeDeploy for pushing things, and CodePipeline to orchestrate it all together. Additionally, you will still need to write unit tests and execute them at build time. Lacking a server doesn’t lessen the value of testing. You can use your standard set of testing tools for your language of choice; a benefit of Function-as-a-service is that it attempts to epitomize single responsibility principle, which actually lends itself well to testing. You can also create additional functions to use as test harnesses and/or utilities.

Lastly, there are a couple of ways to perform ‘local’ development. The first is to use developer-specific environments and still deploy your code and functions to AWS. This has the benefit of the code operating in an identical environment as where it will be deployed to, but has a few minor drawbacks - breakpoints are more difficult to manage, and there is a cost involved in deploying to AWS (not a lot, but it’s there). Add in the clutter that having an environment per team or per developer in addition to ‘dev,' ‘test,' and ‘prod,' and you can see that there is an upkeep cost. Fortunately, there are multiple solutions - AWS provides ‘SAM Local,’ Serverless.com has local invocation of functions, and there’s even localstack - a very robust collection of ‘local’ instances of many AWS services, even including runnable as Docker containers. These solutions can be leveraged to rapidly deploy to a developer’s machine to debug efficiently without polluting your AWS account and/or git repositories.

Special Bonus Lambda Gotchas

Recursion is risky with no limits

A last warning is to watch out for recursive execution of functions, whether intentional or not. In a normal environment, your CPU would max out if you inadvertently put yourself in an infinite loop (the function triggers an event, which in turn triggers the function…). In serverless, you will have executed a “DoW Attack” - Denial of Wallet attack on yourself - and your $10-$20 development bill can shoot up to the thousands with little warning. This is an anti-pattern for all event-driven models, but with the autoscaling capacity of AWS, it can really be an awkward conversation with your Engineering VP or CFO. Some ways to detect or prevent this is are to put CloudWatch event warnings against your total Lambda invocations, or to implement billing alerts. If recursion is really necessary for your product, you could pass data between function calls (in the event object) to keep a recursion count, and put in a failsafe that will abort execution if it reaches a wildly unreasonable level - say 10,000.

Idempotence is Key

There’s a dirty little secret about Lambda execution - your function may be triggered multiple times for the same root event. Some of this is that many of the potential sources are 'deliver-at-least-once,' so may actually fire multiple times; the other reason is that Lambda, under certain circumstances, may actually retry execution of your code (more details here ). Because of this, your Lambda code should all be idempotent. While this is trivial for read operations, it can become significantly complicated in write operations. The ‘easiest’ way to handle this is to leverage the request ID that is passed in from all sources, and find a way within your application logic to see if that request ID has already been processed. If events are passed around, be sure to include the original source request ID in the payload of later events.

Summary

In closing, the future of product deployment will absolutely include serverless aspects - and on AWS, that means Lambda. Moving to these features opportunistically can provide much of the promise of microservices, and if you do it right, as few of the negatives as possible.

Stay cloudy my friends.