Data Engineer Datalake

Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
Full-Time
On-Site
5,000 USD / Month

Job Description:

Responsibilities:

This associate will be part of the Data and AI Engineering team, focusing on Databricks and Spark-based data platforms that power enterprise-scale analytics and AI initiatives. They will play a key role in building, governing, and optimizing data pipelines, catalogs, and governance frameworks to support data-driven decision-making across the organization.

Will design and implement scalable PySpark pipelines in Databricks for batch and streaming workloads.

Will configure and manage Databricks Unity Catalog for data cataloging, classification, lineage, and access control.

Will optimize data availability, latency, performance, and cost within Databricks and associated cloud platforms.

Will implement data governance, audit, and compliance frameworks to ensure security and trust in enterprise data.

Will define and manage semantic data models and Delta Lake architectures for cross-domain analytics.

Will collaborate with ML and AI teams to deliver feature engineering pipelines and ML-ready datasets.

Will lead data engineering projects as SME, ensuring adherence to best practices and agile delivery.

Will create documentation, share knowledge, and mentor team members on Databricks and Spark best practices.

Will support and troubleshoot production data pipelines and ensure high availability.

Skills Required:

Must have 7+ years professional working experience building scalable and high-performance data platforms.

Must have a bachelors degree in Computer Science, Data Engineering, or a related technical field (or equivalent experience).

Must have 5+ years of experience designing and implementing Big Data solutions using Apache Spark / PySpark.

Must have 5+ years of experience with ETL/ELT pipeline development, data modeling, and data transformation frameworks.

Must have 3+ years of hands-on experience managing Databricks environments including Unity Catalog, cluster management, governance, and security.

Must have strong experience in metadata management, data lifecycle, and access control.

Must have strong Experience (5+ years) in .Net, especially in developing RESTFul APIs using C#

Must have strong experience (5+ years) in SQL Server, including writing stored procedures, Schema design and optimization, performance tuning for large datasets

Preferred Skills:

Preferred experience with cloud data platforms such as Azure Data Lake, Snowflake, or AWS S3.

Strong knowledge of Unity Catalog security model (RBAC, lineage, data masking, access policies).

Experience implementing data governance frameworks (e.g., CDMC, DAMA-DMBOK).

Proficiency in Python, PySpark, SQL, and modern data engineering practices.

Experience with DevOps automation (CI/CD, YAML pipelines, Terraform, GitHub Actions).

Experience with Delta Lake, Medallion architecture (Bronze/Silver/Gold layers).

Knowledge of streaming data processing (Structured Streaming, Kafka, Event Hubs).

Excellent problem-solving and analytical skills.

Experience in production support of large-scale data platforms.

Strong communication and documentation skills; ability to collaborate with cross-functional teams.

Quelifications:

They're going to ask for:

That being said, we are looking for someone with just a bit more experience in a few specific areas.

- Firstly, we need someone who is proficient with DLT and has hands-on experience in productionalizing DLT pipelines.

- Additionally, we require expertise in Kafka or any event streaming technology capable of reading and processing events into data/delta lakes.

- Lastly, a thorough understanding of data quality checks between data/delta lakes and OLTP databases is essential.