Big Data Engineer
***No recruiters please - we will not respond***
About The Role
We have a diverse, multi-disciplinary team. The Big Data Engineer role exists to provide engineering subject matter expertise in our client projects. This may be at the designing/planning stage or at the building/testing stage, or both. Within a project there may be one or more Big Data Engineers, depending on the size and complexity of the work.
While the work you will do each day can vary, the following are some common tasks that you are likely to get involved in.
- Designing, implementing and testing end-to-end data processing pipelines. This usually means selecting and integrating the appropriate Big Data tools and frameworks for the job in hand, and can involve any or all of the following:
- Databases and storage methods;
- Messaging systems, such as Kafka;
- Big Data ingestion tools or frameworks;
- Data processing engines/frameworks;
- Orchestration and resource management;
- Integrating into downstream Machine Learning pipelines;
- Data wrangling and Visualisation.
- In addition to implementing the pipelines, best practice QA, environment setup and ongoing governance are first order concerns:
- Documenting test cases and doing code reviews;
- Provisioning appropriate resources, and configuring them thoughtfully;
- [Depending on experience] Planning, designing and implementing hybrid cloud architectures that incorporate on-premise and public/private cloud components.
- Planning and implementing appropriate (roles based) access control;
- Setting up effective monitoring, and refining configuration on an ongoing basis to maximise the signal:noise ratio;
- Building useful metadata collection into the pipeline;
- Maximising automation and configurability, using Airflow or other tools/techniques.
At Data Reply a Big Data Engineer is, above all else, a software engineer: someone who writes well-structured and well-tested code. You are able to reason about the underlying distributed systems that are being extended, and you have a good grasp of foundational concepts in computer science such as data structures, OOP, and algorithmic complexity.
As an engineer, you take a keen interest in measures of system performance (including reliability, efficiency and quality) and strive to find ways to improve them – especially through thoughtful automation.
Quality assurance is fundamental to the role. The successful candidate should:
- Have a thorough understanding of the software development lifecycle;
- Be able to develop Standards and Procedures to determine product quality and release readiness;
- Understand and employ TDD/BDD practices when appropriate;
- Be able to discover bugs within software;
- Have solid experience with one or more relevant testing libraries.
The success of our business depends on our people and how they execute. So you will take the initiative to drive the practice forward by:
- You must have the right to work in the UK;
- Bachelor’s degree in Computer Science or Software Engineering (First class or 2:1);
- 5+ years of recent experience writing Java code in a data-intensive systems environment;
- Comfortable working with structured and unstructured datasets of all shapes and sizes;
- Familiar with SQL and NoSQL database concepts (e.g. 3NF, ACID, CAP, CDC…) and data warehousing concepts (Star schema, Snowflake schema, Data staging, etc);
- Understands modular software design and API design (in particular REST);
- Aware of file formats commonly used in Big Data pipelines, and their relative strengths (Avro, Parquet, etc);
- Experienced in using Apache Spark to implement batch ETL pipelines;
- Experienced with writing optimised SQL queries and aware about the concept of Table Partitions Comfortable working at the linux command line;
- Comfortable working with Git.
Things That Will Set You Apart
Any of the following is a bonus:
- Advanced degree in relevant field;
- Practical experience of developing applications using Python and/or Scala;
- Experience with any cloud platform (Azure, AWS, GCP);
- Experience with stream-processing systems (e.g. Apache Flink, Apache Beam or Spark-Streaming).
- Administration experience with any commercial Hadoop distribution (Cloudera / MapR / Hortonworks…);
- Administration experience with Apache Kafka and any NoSQL database (e.g. Apache Cassandra, MongoDB);
- Experience with tools for automating software/environment build and deployment, such as Maven, rpm, pip, Jenkins, Ansible, Chef, Puppet, Terraform, DC/OS;
- Basic understanding of Machine Learning.
- Competitive remuneration
- Business MacBook Pro
- Culture of learning (including commitment to training and study time)
- Active group social programme - including weekend Hackathons around Europe and monthly 'Aperitif' nights out in London
- Health, pension, etc.
- Modern, open offices 5 minutes from Buckingham Palace and Hyde Park!