AlphaSense provides an AI-based search engine for market intelligence, used by the largest and fastest-growing firms globally. Our mission is to curate and semantically index the world’s market and company information, including the vast high-value content sets that traditional web search engines cannot reach. With 1000+ enterprise clients, AlphaSense helps knowledge professionals become dramatically more productive, and gain an information edge by discovering critical data points and trends that others miss.
You will build a scraping framework and new web sources can be added easily using it. This system will fetch millions of documents every month. The ideal candidate has strong crawling & scraping skills using Python and accompanied by solid experience with working with cloud computing.
You will join our team of world-class experts developing the AlphaSense platform. The team is right at the very core of what we do and responsible for implementing the cutting edge technology for scalable, distributed processing of crawling, searching and text processing functions.
- Developing scalable systems to crawl and scrape data (in form of HTML pages or documents)
- Quickly create proof of concepts for new crawlers/scrapers
- Leveraging cloud computing resources (AWS) to optimally execute back-end processing
- Bachelor’s or Master’s Degree in Computer Science or a related discipline.
- Minimum 5 years of software development experience mainly with Python
- Minimum 2 years of hands-on experience in crawling/scraping using frameworks such as Scrapy, Beautiful Soup, Selenium
- Strong fundamental C.S. skills (Data structures, algorithms, multithreading etc.).
- Solid hands on experience in working in distributed and scalable application environments.
- Excellent oral and written communication skills.
Nice to have
- Experience with Java with Spring.
- Working knowledge on Elasticsearch, REDIS, SOLR/Lucene, and cloud platforms such as AWS or GCP.
- Working knowledge on NOSQL databases such as dynamodb
- Experience with working on Linux platforms, Dockers, K8s