Job Title Big Data Engineer
Location - Bangalore , Chennai , Gurgaon
Experience required - 5 - 8yrs
Job Summary:
We are seeking Big Data Engineers to join our offshore team. The ideal candidates will have extensive experience in PySpark, real-time data processing, and Kafka. They should be well-versed in designing and implementing ETL pipelines and working with streaming data frameworks in cloud-based environments.
Key Responsibilities:
- Develop, optimize, and maintain streaming data pipelines using PySpark and Apache Kafka.
- Design and implement scalable ETL processes for real-time and batch data processing.
- Work with Amazon EMR, Apache Spark, Apache NIFI, or similar frameworks to build near real-time data pipelines.
- Develop data solutions and frameworks to handle high-volume, high-velocity data streams.
- Implement data storage solutions for structured and unstructured data, ensuring efficiency and reliability.
- Write clean, maintainable, and well-documented code using Python, Groovy, or Java.
- Optimize data structures and schemas for data processing and retrieval efficiency.
- Collaborate with cross-functional teams to define, design, and implement data-driven solutions.
- Troubleshoot and resolve performance bottlenecks and data quality issues.
- Stay updated with the latest technologies and best practices in big data and streaming architectures.
Required Qualifications & Skills:
- Proficiency in PySpark and strong understanding of distributed computing principles.
- Hands-on experience with Apache Kafka (including kSQL, Mirror Maker, or similar tools) for real-time data streaming.
- Strong programming skills in at least one of the following: Python, Groovy, or Java.
- Good understanding of data structures, ETL design, and data storage solutions.
- Experience in working with Amazon EMR, Apache Spark, Apache NIFI, or similar streaming data frameworks.
- Ability to design and implement scalable and high-performance data pipelines.
- Experience with cloud-based big data solutions (AWS, GCP, or Azure) is a plus.
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
Preferred Qualifications:
- Experience with SQL and NoSQL databases.
- Exposure to containerization technologies such as Docker and Kubernetes.
- Familiarity with CI/CD pipelines for data engineering workflows.
- Understanding of data governance, security, and compliance best practices.
Why Join Us?
- Work on cutting-edge real-time streaming and big data technologies.
- Collaborate with an expert team in a fast-paced, dynamic environment.
- Competitive compensation and opportunities for professional growth.
- Gain experience in modern cloud-based data architectures.
If you are passionate about data engineering, big data, and real-time analytics, we invite you to apply and be a part of our innovative team!
Please apply here or share your CV to sandya.velamuri@derisk360.com