Creating Code Search Sitemaps
by Geethalakshmi[ Edit ] 2010-06-17 14:56:51
Creating Code Search Sitemaps
A Code Search Sitemap uses the Sitemap protocol, with additional Code Search-specific tags as defined below. Here is a sample of a Code Search Sitemap entry using Code Search-specific tags:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:codesearch="http://www.google.com/codesearch/schemas/sitemap/1.0">
<url>
<loc>http://mysite.org/download/myfile.c</loc>
<codesearch:codesearch>
<codesearch:filetype>C</codesearch:filetype>
<codesearch:license>LGPL</codesearch:license>
</codesearch:codesearch>
</url>
<url>
<loc>http://mysite.org/download/myproject.tgz</loc>
<codesearch:codesearch>
<codesearch:filetype>archive</codesearch:filetype>
<codesearch:license>Apache</codesearch:license>
<codesearch
ackagemap>packagemap.xml</codesearch
ackagemap>
</codesearch:codesearch>
</url>
</urlset>
Each URL in a Code Search Sitemap can point to an archive file or a code file.
Code Search-specific tag definition
<codesearch:filetype> | Required | Case-insensitive. The value "archive" indicates that the file is an archive file. For source code files, the value defines the the source code language. Examples include "C", "Python", "C#", "Java", "Vim". For source code language, the Short Name, as specified in the list of supported languages, must be used. The value must be printable ASCII characters, and no white space is allowed. Only supported languages will be indexed. If the language of your code is not yet supported, you can still submit the Sitemap and Google may index your code in the future. |
<codesearch:license> | optional | Case-insensitive. The name of the software license. For archive files, this indicates the default license for files in the archive. Examples include "GPL", "BSD", "Python", "disclaimer". You must use the Short Name, as specified in the list of supported licenses. When the value is not one of the recognized licenses, this will cause us to index the item as "unknown license". |
<codesearch:filename> | optional | The name of the actual file. This is useful if the URL ends in something like download.php?id=1234 instead of the actual filename. The name can contain any character except "/". If the file is an archive file, it will be indexed only if it has one of the supported archive suffixes. |
<codesearchackageurl> | optional |
For use only when the value of codesearch:filetype is not "archive". The URL truncated at the top-level directory for the package. For example, the file http://path/Foo/1.23/bar/file.c could have the packageurl http://path/Foo/1.23. All files in a package should have the same packageurl. This tells us which files belong together. |
<codesearchackagemap> | optional |
Case-sensitive. For use only when codesearch:filetype is "archive". The name of the packagemap file inside the archive. Just like a Sitemap is a list of files on a web site, a packagemap is a list of files in a package. |