I am familiar with the technologies, but I have not used them yet. Because your requirements are very vague, I will list the most popular Apache solutions (there are other alternatives).
Even Google (its creator) does not use MapReduce anymore, they made a new framework, more flexible that is under the Apache umbrella (Beam): beam.apache.org/
So just a quick oversight:
to move the data from your data-lake to the processing units, and back: Apache NiFi or Apache Airflow, perhaps with a Kafka on the way, if needed
These tools also allows Data Enrichment!
to process your data: Beam, Flink (they both support batch + streaming), or Spark (especially if you have any ML algorithms). If it is text based you may need something on Lucene (Solr or ElasticSearch).
Managed solutions would be BigQuery/BigTable, managed Spark and more: cloud.google.com/products/big-data/
Requirements are vague because I just didn't want to go too much into details.
I haven't heard about Apache Beam yet, but this looks quite interesting. Will definitely look at it!
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.