The Search Platform team at Wikimedia Foundation (WMF) is seeking an expert consultant to partner with us in investigating and implementing Natural Language Processing (NLP) as part of our query analysis pipeline, and we are soliciting proposals to add some specific components focusing on two main NLP areas:
- Analyzing spelling mistakes people might be making when querying and providing results based on corrected spelling errors, and;
- Improving our “Did You Mean” suggestions that provide search options similar to the determined query intent when there are no or few results.
Our current query analysis pipeline utilizes a set of algorithms to break down queries into tokens that are then further processed to determine, as best we can, the intent of the query so we can provide the most relevant and best ranked results, across almost 300 languages. This is aided by a machine learning component in the form of a learning to-rank plugin for Elasticsearch for the top 19 languages (by search volume). Adding NLP to our analysis chain will help us achieve greater search satisfaction, and this project will be laying the foundations for more NLP work in the future.
Ideally, the NLP work for this project should be either developed as a PHP module or encapsulated in its own Elasticsearch plugin, which will be incorporated into the pipeline with the help of the Search Platform engineering staff. We are open to other ideas as well, as long as we see a path to incorporating this work into the query analysis pipeline we currently run. The specific programming language(s) can be Java and/or PHP, and possibly Python if there is a need, and we would be willing to entertain other options, as long as we can safely incorporate them into our Elasticsearch-centered ecosystem of components. To be clear, however, Elasticsearch experience is not required, as we can help with any integrations required on that end.
Most importantly, respondents should have previous experience applying NLP to search and/or spelling correction. Beyond that, it would be great if you have experience with Elasticsearch, and with building testing and analysis components to help determine the effectiveness of query results as NLP techniques are applied. In your proposal, please indicate your prior experience and briefly summarize how you plan to approach this work, including your preferred programming language(s) and any expectations you have about the infrastructure required to support your direction.
We are fiercely dedicated to open source software at WMF, and all work completed needs to be made available under open source licensing. No closed source, proprietary solutions will be considered.
The desired outcome of this project is a measurable performance improvement utilizing stock spelling correction tests and data extracted from query logs. Determining whether we achieve an increase will require testing and analysis of sets of collected queries before and after the application of NLP in the analysis pipeline.
WMF Sponsor: Erika Bjune, Director of Engineering, Search Platform
Technical Lead: Erik Bernhardson, Search Platform
Engineering Support: David Causse, Trey Jones, Guillaume Lederrey
Timeline and Cost Estimates:
Ideally, we are hoping to get this initial component completed in a three-month time frame, but please indicate in your proposal how long you think the work will take, along with your cost estimate and/or hourly rate.
To summarize, your proposal should include:
- A resume or CV and a statement of your prior experience with NLP;
- A summary of the steps you would take in approaching the work, including desired programming language(s) and infrastructure expectations;
- An estimate of the time required, and;
- Either a fixed cost or hourly rate and estimated total hours to complete the project.
If you have clarifying questions you would like answered before submitting a proposal, feel free to send them to Erika Bjune at firstname.lastname@example.org, and we’d be glad to answer them. When you are ready to submit your proposal, send it to Erika at that same email, and we’ll be in touch.
The Wikimedia Foundation is...
...the nonprofit organization that supports Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.
The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply