Data Engineer

As a data engineer for OceanSMART you will be working on building / maintaining data pipelines, acquiring data from our external suppliers, capturing both real-time data and static data. Making this data available to our development teams for being exposed through our Software as a Service solution.
Our real-time and batch processing pipelines process a daily load of over 200 million messages and derive insights from billions of historical records.

Design and maintain reliable, scalable, efficient data processing pipelines to process real-time data using Event driven approaches.
Design, develop and maintain highly scalable Kafka Stream processing microservices using Java and Quarkus framework and deploy in Kubernetes
Use Apache Spark to process historical data (billions of records and terabytes in size) using Scala/Python and derive optimizations & insights
Responsible for data modelling in RDBMS, NoSQL data stores like HBase/Mongo DB and Archive datastores on Azure Blobs or ADLS Gen2 datastores
Monitoring/Alerting
Identify opportunities and explore ways to enhance data infrastructure to improve data acquisition, data processing, and data visualization processes
Collaborate with DevOps for data pipeline setup and configuration

Personal Traits

Being open-minded and curious
Speak up when solutions are not ideal and communicate suggested solutions to cross-functional pairs.
Ability to work independently because you are the first brick of the team, many opportunities but also many challenges.
A true team player with great communication and interpersonal skills.

A POTENTIAL CANDIDATE WILL HAVE

Required Experience:

Bachelor’s or master's degree in computer science, Software Engineering, Information Technology, or a related technical field.
3+ years' experience as a Data Engineer or similar role.
Understand the big picture
Experience with big-data technologies such as Apache Kafka and Apache Spark.
Experience with JVM languages (Java or Scala), Python.
Good at multi-threading, atomic operations
Computation framework: Spark (Azure Batch & Databricks)
Databases:
SQL: Postgres (Postgis), SQL server, TimescaleDB
NoSQL: Hbase, Cassandra, MongoDB
Object data stores: Azure Blob Store, different storage tiers
Mapping: Geoserver
Understand designs of resilience, fault-tolerance, high availability, and high scalability

Desirable

Hand-on experiences on popular Data platforms such as Azure, Amazon or GCP with event driven, micro-service architecture for loosely coupled, highly scalable design.
Experience in performance tuning/optimizing Big Data programs.
• Familiar with Scrum framework, Scrum with Kanban

SOME OF OUR BENEFITS

Great salary package. Annual performance review.
13th-salary Bonus for all staff.
Premium Healthcare Insurance (2 sponsored packages): you and your spouse/child/parent.
Annual Health Check-up for all staff.
Annual Loyalty Award packages (3mil - 5mil - 10mil), 5-year Award, and 10-year Award.
Good career advancement opportunities.
Product-oriented. Agile project management style. Dynamic and English-speaking working environment.
Opportunity to acquire technical knowledge and experience in the latest technologies.
Up to 18 annual leaves a year PLUS 05 sponsored extra day.
Working hours: 8 hours x 5 days/week (Monday to Friday). Thirty-min break at 4 PM every day.
And so much more!

If this interest you, please contact us for a coffee. So we can share and learn more about you.