Project Description:

The Search Platform team at Wikimedia Foundation (WMF) is seeking an expert consultant to partner with us in investigating and implementing Natural Language Processing (NLP) as part of our query analysis pipeline, and we are soliciting proposals to add some specific components focusing on two main NLP areas:

  1. Analyzing spelling mistakes people might be making when querying and providing results based on corrected spelling errors, and;
  2. Improving our “Did You Mean” suggestions that provide search options similar to the determined query intent when there are no or few results.

Our current query analysis pipeline utilizes a set of algorithms to break down queries into tokens that are then further processed to determine, as best we can, the intent of the query so we can provide the most relevant and best ranked results, across almost 300 languages. This is aided by a machine learning component in the form of a learning to-rank plugin for Elasticsearch for the top 19 languages (by search volume). Adding NLP to our analysis chain will help us achieve greater search satisfaction, and this project will be laying the foundations for more NLP work in the future.

Project Requirements:

Ideally, the NLP work for this project should be either developed as a PHP module or encapsulated in its own Elasticsearch plugin, which will be incorporated into the pipeline with the help of the Search Platform engineering staff. We are open to other ideas as well, as long as we see a path to incorporating this work into the query analysis pipeline we currently run. The specific programming language(s) can be Java and/or PHP, and possibly Python if there is a need, and we would be willing to entertain other options, as long as we can safely incorporate them into our Elasticsearch-centered ecosystem of components. To be clear, however, Elasticsearch experience is not required, as we can help with any integrations required on that end.

Most importantly, respondents should have previous experience applying NLP to search and/or spelling correction. Beyond that, it would be great if you have experience with Elasticsearch, and with building testing and analysis components to help determine the effectiveness of query results as NLP techniques are applied. In your proposal, please indicate your prior experience and briefly summarize how you plan to approach this work, including your preferred programming language(s) and any expectations you have about the infrastructure required to support your direction.

We are fiercely dedicated to open source software at WMF, and all work completed needs to be made available under open source licensing. No closed source, proprietary solutions will be considered.

Desired Outcome:

The desired outcome of this project is a measurable performance improvement utilizing stock spelling correction tests and data extracted from query logs. Determining whether we achieve an increase will require testing and analysis of sets of collected queries before and after the application of NLP in the analysis pipeline.

Stakeholders:

WMF Sponsor: Erika Bjune, Director of Engineering, Search Platform

Technical Lead: Erik Bernhardson, Search Platform

Engineering Support: David Causse, Trey Jones, Guillaume Lederrey

Timeline and Cost Estimates:

Ideally, we are hoping to get this initial component completed in a three-month time frame, but please indicate in your proposal how long you think the work will take, along with your cost estimate and/or hourly rate.

Submissions:

To summarize, your proposal should include:

  • A resume or CV and a statement of your prior experience with NLP;
  • A summary of the steps you would take in approaching the work, including desired programming language(s) and infrastructure expectations;
  • An estimate of the time required, and;
  • Either a fixed cost or hourly rate and estimated total hours to complete the project.

If you have clarifying questions you would like answered before submitting a proposal, feel free to send them to Erika Bjune at ebjune@wikimedia.org, and we’d be glad to answer them. When you are ready to submit your proposal, send it to Erika at that same email, and we’ll be in touch.

The Wikimedia Foundation is... 

...the nonprofit organization that supports Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply

 

More information

WMF

Blog

Annual Report - 2017

Wikimedia 2030



Apply for this Job

* Required

File   X
File   X


U.S. Equal Opportunity Employment Information (Completion is voluntary)

Individuals seeking employment at Wikimedia Foundation are considered without regards to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. You are being given the opportunity to provide the following information in order to help us comply with federal and state Equal Employment Opportunity/Affirmative Action record keeping, reporting, and other legal requirements.

Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.


Form CC-305

OMB Control Number 1250-0005

Expires 1/31/2020

Voluntary Self-Identification of Disability

Why are you being asked to complete this form?

Because we do business with the government, we must reach out to, hire, and provide equal opportunity to qualified people with disabilities1. To help us measure how well we are doing, we are asking you to tell us if you have a disability or if you ever had a disability. Completing this form is voluntary, but we hope that you will choose to fill it out. If you are applying for a job, any answer you give will be kept private and will not be used against you in any way.

If you already work for us, your answer will not be used against you in any way. Because a person may become disabled at any time, we are required to ask all of our employees to update their information every five years. You may voluntarily self-identify as having a disability on this form without fear of any punishment because you did not identify as having a disability earlier.

How do I know if I have a disability?

You are considered to have a disability if you have a physical or mental impairment or medical condition that substantially limits a major life activity, or if you have a history or record of such an impairment or medical condition.

Disabilities include, but are not limited to:

  • Blindness
  • Deafness
  • Cancer
  • Diabetes
  • Epilepsy
  • Autism
  • Cerebral palsy
  • HIV/AIDS
  • Schizophrenia
  • Muscular dystrophy
  • Bipolar disorder
  • Major depression
  • Multiple sclerosis (MS)
  • Missing limbs or partially missing limbs
  • Post-traumatic stress disorder (PTSD)
  • Obsessive compulsive disorder
  • Impairments requiring the use of a wheelchair
  • Intellectual disability (previously called mental retardation)
Reasonable Accommodation Notice

Federal law requires employers to provide reasonable accommodation to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job or to perform your job. Examples of reasonable accommodation include making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.

1Section 503 of the Rehabilitation Act of 1973, as amended. For more information about this form or the equal employment obligations of Federal contractors, visit the U.S. Department of Labor's Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.