Creating Code Search Sitemaps

by Geethalakshmi 2010-06-17 14:56:51

Creating Code Search Sitemaps


A Code Search Sitemap uses the Sitemap protocol, with additional Code Search-specific tags as defined below. Here is a sample of a Code Search Sitemap entry using Code Search-specific tags:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:codesearch="http://www.google.com/codesearch/schemas/sitemap/1.0">
<url>
<loc>http://mysite.org/download/myfile.c</loc>
<codesearch:codesearch>
<codesearch:filetype>C</codesearch:filetype>
<codesearch:license>LGPL</codesearch:license>
</codesearch:codesearch>
</url>

<url>
<loc>http://mysite.org/download/myproject.tgz</loc>
<codesearch:codesearch>
<codesearch:filetype>archive</codesearch:filetype>
<codesearch:license>Apache</codesearch:license>
<codesearchRazz
ackagemap>packagemap.xml</codesearchRazz
ackagemap>
</codesearch:codesearch>
</url>
</urlset>


Each URL in a Code Search Sitemap can point to an archive file or a code file.

Code Search-specific tag definition




<codesearch:filetype>
Required

Case-insensitive. The value "archive" indicates that the file is an archive file. For source code files, the value defines the the source code language. Examples include "C", "Python", "C#", "Java", "Vim". For source code language, the Short Name, as specified in the list of supported languages, must be used. The value must be printable ASCII characters, and no white space is allowed.

Only supported languages will be indexed. If the language of your code is not yet supported, you can still submit the Sitemap and Google may index your code in the future.

<codesearch:license>
optional

Case-insensitive. The name of the software license. For archive files, this indicates the default license for files in the archive. Examples include "GPL", "BSD", "Python", "disclaimer". You must use the Short Name, as specified in the list of supported licenses.

When the value is not one of the recognized licenses, this will cause us to index the item as "unknown license".

<codesearch:filename>
optionalThe name of the actual file. This is useful if the URL ends in something like download.php?id=1234 instead of the actual filename. The name can contain any character except "/". If the file is an archive file, it will be indexed only if it has one of the supported archive suffixes.

<codesearchRazz
ackageurl>
optional For use only when the value of codesearch:filetype is not "archive". The URL truncated at the top-level directory for the package. For example, the file http://path/Foo/1.23/bar/file.c could have the packageurl http://path/Foo/1.23. All files in a package should have the same packageurl. This tells us which files belong together.
<codesearchRazz
ackagemap>
optional Case-sensitive. For use only when codesearch:filetype is "archive". The name of the packagemap file inside the archive. Just like a Sitemap is a list of files on a web site, a packagemap is a list of files in a package.


Tagged in:

794
like
0
dislike
0
mail
flag

You must LOGIN to add comments