There was a park in the town in Virginia where I used to live that had been a railroad track that was turned into a walking path. At one place near that track was a historic turntable where cargo trains might be unloaded so that they could be added to later trains or trains headed in the opposite direction. This is a technology that is no longer used but it is an example of how technology changes and evolves over time. There are people who write about SEO who have insisted that Google uses a technology called Latent Semantic Indexing to index content on the Web, but make those claims without any proof to back them up. I thought it might be helpful to explore that technology and its sources in more detail. It is a technology that was invented before the Web was around, to index the contents of document collections that don’t change much. LSI might be like the railroad turntables that used to be used on railroad lines. There is also a website which offers “LSI keywords” to searchers but doesn’t provide any information about how they generate those keywords or use LSI technology to generate them, or provide any proof that they make a difference in how a search engine such as Google might index content that contains those keywords. How is using “LSI Keywords” different from keyword stuffing that Google tells us not to do. Google tells us that we should:
Where does LSI come fromOne of Microsoft’s researchers and search engineers, Susan Dumais was an inventor behind a technology referred to as Latent Semantic Indexing which she worked on developing at Bell Labs. There are links on her home page that provide access to many of the technologies that she worked upon while performing research at Microsoft which are very informative and provide many insights into how search engines perform different tasks. Spending time with them is highly recommended. She performed earlier research before joining Microsoft at Bell Labs, including writing about Indexing by Latent Semantic Analysis. She was also granted a patent as a co-inventor on the process. Note that this patent was filed in April of 1989, and was published in August of 1992. The World Wide Web didn’t go live until August 1991. The LSI patent is: Computer information retrieval using latent semantic structure Abstract
The problem that LSI was intended to solve:
The summary section of the patent tells us that there is a potential solution to this problem. Keep on mind that this was developed before the world wide web grew to become the very large source of information that it is, today:
To illustrate how LSI works, the patent provides a simple example, using a set of 9 documents (much smaller than the web as it exists today). The example includes documents that are about human/computer interaction topics. It really doesn’t discuss how a process such as this could handle something the size of the Web because nothing that size had quite existed yet at that point in time. The Web contains a lot of information and goes through changes frequently, so an approach that was created to index a known document collection might not be ideal. The patent tells us that an analysis of terms needs to take place, “each time there is a significant update in the storage files.” There has been a lot of research and a lot of development of technology that can be applied to a set of documents the size of the Web. We learned, from Google that they are using a Word Vector approach developed by the Google Brain team, which was described in a patent that was granted in 2017. I wrote about that patent and linked to resources that it used in the post: Citations behind the Google Brain Word Vector Approach. If you want to get a sense of the technologies that Google may be using to index content and understand words in that content, it has advanced a lot since the days just before the Web started. There are links to papers cited by the inventors of that patent within it. Some of those may be related in some ways to Latent Semantic Indexing since it could be called their ancestor. The LSI technology that was invented in 1988 contains some interesting approaches, and if you want to learn a lot more about it, this paper is really insightful: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. There are mentions of Latent Semantic Indexing in Patents from Google, where it is used as an example indexing method:
~ Classifying text into hierarchical categories Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately. Plugin by Taragana The post Does Google Use Latent Semantic Indexing? appeared first on SEO by the Sea ⚓. from http://www.seobythesea.com/2018/01/google-use-latent-semantic-indexing/
0 Comments
Leave a Reply. |