A Little Background
Over the years, we’ve worked with a lot of academic institutions, public libraries, and commercial companies. Guided search and machine learning are both methods that we use to design better digital experiences. Bluespark Senior Developer Pablo Cerda and Bluespark UX Director Mark Dodgson teamed up to see how we might push the boundaries on these methods.
May 2018
This was their first formal meeting to discuss how to start exploring the idea of guided search and machine learning. “We look pretty happy on day one,” Mark says.
The Non-Technical Perspective
What Does This All Mean? It means that we’re improving the way you can look for and ultimately find information.
Let’s say you’re a university librarian and you’re looking for resources on the history of Barbie and Mattel. You search using the phrase, “history of Barbie and Mattel.” As a user, you don’t want the computer to give you all the resources that include the words “of”, “and”, or even “history.” You would be overloaded with irrelevant information.
But you do want the computer system to give you resource possibilities that you might not have thought of before -- say, a category on toy company founders. Or one on plastic molding. These categories are more choices for you, and as the user you then have a way to further direct the computer system on how you want to proceed -- or which path you want to take. The machine helps guide you but it does not assume your destination. Think of it as a bit of “choose your own adventure.”
A process like guided search gives the user more control and helps him/her achieve a more comprehensive end result.
For example, if a user performs a search for Python, traditional search would return results for Python (the snake), Python (the programming language), Python (a Band, song, agency, etc.). Guided search takes the requested search term and displays questions to the user to further narrow the results. In the case of the search term “Python” we might display the following questions:
- Do you mean Python Programming?
- Do you mean Python the snake?
Once the user selects a question the system can present more relevant results, leading to a better user experience.
Additionally, computer systems can “learn” from these patterns. More on this later, though.
The Technical Perspective
Mark and Pablo started by discussing how to run simple tests with a database (and what search terms to use to provide context). Here are some of the questions they asked themselves:
- How can we learn from the results? How can use these results to offer suggestions and help write guided questions?
- Can we use machine learning to create the initial categories and then have machine learning help write questions from those categories?
- What database sources can we use for testing?
Pablo and Mark discussed a couple of approaches to search options, such as:
- Profile based guided search: User is logged in, or links social media accounts to help add relevancy to the results, essentially creating a profile for the results.
- Behavioral Search (based on past behaviors on site): Display results based on what a use has previously viewed to add relevance and context to the results. A behavioral search would use implicit data but could include explicit data if we asked clarifying guided questions.
June 2018
- Pablo created a script that scans the database and pulls every word out of every sentence and tokenizes it for use in clusters.
- Identified need to normalize the data and filter out words that we don't need such as "it" "as", etc..
- Identified need to analyze the words to find similarities then finally we can process the data using entity extraction (Chunking, Noun Phrase Chunking, etc.)
Information Extraction Architecture
Chunking
Noun Phrase Chunking
Since PHP doesn’t have good libraries for this kind of project, Pablo and Mark are looking at using other options like, R, Python, and Javascript UI.
Resources Used
- K-Means Clustering
- Topic Modeling ("topic Modelling")
- Machine Learning Library
- Porter Stemming Algorithm
What’s Next?
- Create word vectors and pass them to algorithms
- Review Google site search terms to write questions based on real search terms
***