Key Responsibilities:
- ETL Development: Use Talend to develop, optimize, and maintain ETL workflows for ingesting, processing, and transforming data from a variety of sources.
- Data Pipeline Design: Develop scalable, high-performance data pipelines using Apache Spark and Python for batch and real-time data processing.
- Data Integration: Extract and integrate data from multiple sources such as SQL Server, PostgreSQL, AWS Redshift, and Cloudera.
- Data Quality: Ensure data quality and consistency by building and implementing validation, cleansing, and transformation logic.
- Collaboration: Work closely with data scientists, business analysts, and other stakeholders to understand data requirements and support data-driven decision-making.
- Data Modeling: Assist in designing and developing database tables and schemas, including optimizing the performance of databases like MySQL, PostgreSQL, and NoSQL databases (e.g., MongoDB, Cassandra).
- Automation: Automate routine data management tasks using Python scripting, Talend, or other workflow automation tools.
- Troubleshooting & Optimization: Identify and resolve performance bottlenecks, data discrepancies, and pipeline failures.
Required Qualifications:
- Bachelor’s degree in Computer Science, Data Science, Information Systems, or a related field.
- 3+ years of hands-on experience with Talend for ETL development.
- Strong proficiency in Spark, SQL and Python
- Extensive experience with Apache Spark for big data processing (both batch and streaming).
- Proficient in working with relational databases (SQL Server, PostgreSQL, MySQL) and non-relational databases (MongoDB, Cassandra).
- Solid understanding of data structures, database design, and data modeling.
- Strong problem-solving and troubleshooting skil