Dr. Jim Jansen, Assistant Professor at Pennsylvania State University explains what relevant information can be found about your customers using search logs.
One of my conference presentations, To What Degree Can Log Data Profile a Web Searcher?, addresses research about developing a profile of a Web search using just the few fields in a search log.
Typical search logs really don’t have that many fields, a dozen or so at most. Even when one enriches the logs (i.e., combining fields to make new ones of greater insight), the logs are still sparse.
So, in this research, we set out to determine what we could find out about a user based on a record or so in a search log. We found out that we could actually determine a lot!
Here are some examples.
- With the IP address, we can get a fairly accurate pin point of the location. From geo-targeting data in the query, we can tell the location of focus.
- From the query, we can do a topic analysis to determine the searching focus.
- We can use algorithms already available to determine what type of content is desired (e.g., informational, navigational, transactional).
- We can determine if the query has commercial intent and where in the buying cycle the user is at.
- Using session level data, we can determine the level of engagement (i.e., how motivated is the user to get the desire content).
- Based on pattern analysis, we can determine with a fair degree of accuracy what will be the next query reformulation.
- Using out-of-log sources, such as http://adlab.microsoft.com/Demographics-Prediction/, we can determine the gender bias of the query.
- Using neural network techniques, we can determine the probability of the user clicking on a result listing.
- With enough temporal data, we can reasonably determine the identity of the searcher.
I am not a big online privacy dude, but the results surprised me when you put all the techniques together. And, techniques are just going to get better and better.
There are some really interesting aspects for advertisers, great possibilities for consumers, and concerns for all.
Dr. Jim Jansen is an associate professor in the College of Information Sciences and Technology at Pennsylvania State University. Jim has more than 150 publications in the area of information technology and systems, with articles appearing in a multi-disciplinary range of journals and conferences. His specific areas of expertise are Web searching, sponsored search, and personalization for information searching. He is co-author of the book, Web Search: Public Searching of the Web and co-editor of the book Handbook of Weblog Analysis.

