Job Purpose and Background in summary
We are looking for a Data Scientist to be responsible for leading and delivering the goals of CDP’s new data platform. The successful candidate will work on designing and delivering script migration, helping to build up the azure data platform capabilities and will work closely with the Data & Insights Department to ensure cross-organization consistency.
This role requires experience in building data pipelines, deploying models using azure stack, automation of pipelines as well as sourcing and preparation of data working together with the data engineering team. The successful candidate will need to demonstrate capability to work and communicate effectively with others, including stakeholders and thematic teams, to ensure processes are followed, deliverables are aligned to milestones and outputs are built to agreed quality standards.
CDP is a not-for-profit charity that runs the global disclosure system for investors, companies, cities, states and regions to manage their environmental impacts. The world’s economy looks to CDP as the gold standard of environmental reporting with the richest and most comprehensive dataset on corporate and city action. In 2021 we launched our new five-year strategy: Accelerating the Rate of Change - find out more here. Visit https://cdp.net/en or follow us @CDP to find out more.
Key responsibilities include:
- Delivery of script migration from various on-prem version control systems and sources
- Re-development and enhancements of scripts to take advantage of the new data platform - Data Bricks capabilities and Azure stack
- Delivery of automated data cleaning and structuring algorithms
- Assistance in 3rd party provisioning and preparation of data
- Strong skills in translation of requirements to code
- Collaboration with cloud team on configuration and best practices of data cloud platform
- SQL code translation to python
- Pre and post processing script creation to fit current code bases with no or minimal alterations
- Performance tracking scripting and dashboarding
- Creation of power bi insight dashboards and excellent visualization skills
- Productionising analysis pipelines through a cloud toolset and hosting static and dynamic presentations of the generated insight.
Required skills and experience:
- Msc/PhD educated in Computer Science, Statistics and Mathematics or similar.
- At least 2 years of experience using open source programming language for large scale analysis (Python and R, Rspark, PySpark) and relational databases (MongoDB, Parquet, Hive) and using SQL to query databases
- A strong mathematical and statistical background with a deep understanding of statistical inference, experimental design, sampling, and simulation
- Strong experience in the training and production of machine learning models using both structured and unstructured data in big data pipelines, in Azure
- Experience with well-known code libraries for data preprocessing (pandas, dplyr, tidyr, , scipy, feature-engine, beautiful soup, scrapy, spacy, nltk, TextBlob, fastText, polyglot, requests, json, functools).
- Good technical communication & presentation skills in English.
- Be able to work in a matrix environment within a virtual team.
- Excellent problem-solving skills.
- Strong Project experience with NLP, text analytics and other relevant areas (e.g. text classification, topic detection, information extraction, Named Entity recognition, entity resolution, Question-Answering, sentiment analysis, event detection, language modelling).
- Experience with managing and deploying models using Azure Data Bricks, Azure Data Lake, Azure Data Factory
- Excellent data visualization skills using Power BI or similar tools.
- Experience with version control and shell scripting
Desired skills and experience (optional):
- Experience working as part of a scrum team.
- Experience with automatic testing scripts and environmental promotion pipelines (e.g. azure DevOps pipeline).
- A good understanding of GHG and sustainability data.
- Knowledge of the financial system and capital markets.
- An awareness of environmental issues, particularly as they relate to our core themes of Climate Change, Deforestation and Water Security.
- Ambition to start to enable and coach colleagues as part of an expanding organization with growing data science organization.
This is a full time role based at CDP’s London office reporting to The Head of Data Science and Products, based in Berlin.
Salary and benefits: £34,000 - £36,000 per annum, 30 days’ holiday plus bank holidays, generous non-contributory pension provision, Employee Assistance Programme, life assurance, training and development, flexible working opportunities and other benefits.
Interested applicants must be eligible to work legally in the UK. We cannot sponsor this role.
Before you apply
We’ll only use the information you provide to process your application. For more details on how we use your information, see our applicant’s privacy notice. By uploading your CV and covering letter, you are permitting CDP to use the information you have provided for recruitment purposes.
How to apply:
Please upload your CV in the application form along with a covering letter as an additional document setting out how you meet the required skills and experience or key responsibilities, which should be no more than two pages. The deadline is 9th October 2022.