Spark is becoming more SQL centric and the usage of RDD is totally reduced due to better performance and ease of use in SparkSQL and Dataframe. Many Data Engineers are happy to work in Spark. Many data pipelines are getting easy conversion in Spark from the traditional to the modern cloud based data Infra. So the next Big thing here will be ZEROCODE & REUSABLE GENERIC approach based Spark processing. Yes! Many solutions are emerging to make the work of Big Data engineers a lot lighter. Enterprises can drastically reduce the development cost and timelines. Optimizations can be added as parameters and custom variables.
Kafka, the next hero of Big Data is having stable KSql which is becoming the staging data store. WHAT?? Yes - Kafka is evolving a lot with its new additions, Kafka is an enterprise data ingestion platform, it can do very low latency streaming and it can hold the data as staging for a configurable period of time. Data processing and pipeline building will become the next big thing in Spark and Kafka with all the influences.
Flink, has another arrow to its bow with its new addition FlinkSQLser, putting Flink a lot closer to the data engineers. Apache Flink is available in Amazon EMR and Cloudera’s CDF (public and private cloud) has Flink as a major component. Due to increasing demand in low latency streaming, real time machine learning and AI solutions, Flink will be a good skill to add to your skills list.
Airflow — This should be a priority for your bucket list. Airflow will play a vital role in workflows and orchestrations. Due to its easy integrations with Spark, Kafka, Kubernetes and many other well used components. Offering a red carpet approval for airflow in organisations who prefer stable Opensource solutions. Recently AWS announced "Amazon Managed Worflows for Apache Airflow”. MWAA is an eye opener for the techies and bigdata enthusiasts to focus on and explore deeper
NiFi — NiFi’s ability on Extract and Load with numerous reusable connectors, ease of use Web Based UI and limited transformation ability is helping many engineering teams with performing low latency, high throughput and more reliability on Data Flow and Data Ingestion. Usage of NiFi is drastically improved in organizations focussing on IoT, Smart Device managements and various other areas where real-time ingestion and streaming play a vital role.
Along with all the above components, 2021 will be a major year for the Cloud Data infrastructures and managed services to grow infinitely.
Keep learning more about these subjects and develop a challenging career.