We are looking for an ambitious individual who never gets tired of handling a large pool of unstructured data sources generated from multiple online and offline sources.
Data is the most important fuel for the solutions we build at our company. Build tools and frameworks which can scrape data without much developer intervention.
Our goal is to build 100K+ of stable data pipelines. This requires a high degree of automation processes. As a Web Scraping focused Data Engineer, you will be responsible for extracting and ingesting data from websites using web crawling tools.
In this role, you will own the creation process of these tools, services, and workflows to improve crawl/scrape analysis, reports, and data management. We will rely on you to test the data and the scrape to ensure accuracy and quality. You will own the process to identify and rectify any issues with breaks as well as scale scrapes as needed.
- Experience running large scale web scrapes
- Solid Python knowledge
- Familiarity with techniques and tools for crawling, extracting, and processing data (e.g. Scrapy, pandas, MapReduce, SQL, BeautifulSoup, etc).
- Familiarity with PDF files management and text/object extraction tools including OCR.
- Experience in e-commerce web scraping is a plus
- Work experience with NoSQL databases to store raw data is a plus
- Experience with version control, open-source practices, and code review
- Experience with applications designed to display archived web content
- Great communication skills (written and Spoken in English)