source: proiecte/HadoopJUnit/hadoop-0.20.1/src/docs/cn/src/documentation/content/xdocs/hadoop_archives.xml @ 120

Last change on this file since 120 was 120, checked in by (none), 14 years ago

Added the mail files for the Hadoop JUNit Project

  • Property svn:executable set to *
File size: 3.4 KB
Line 
1<?xml version="1.0"?>
2<!--
3  Copyright 2002-2004 The Apache Software Foundation
4 
5  Licensed under the Apache License, Version 2.0 (the "License");
6  you may not use this file except in compliance with the License.
7  You may obtain a copy of the License at
8 
9      http://www.apache.org/licenses/LICENSE-2.0
10     
11  Unless required by applicable law or agreed to in writing, software
12  distributed under the License is distributed on an "AS IS" BASIS,
13  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14  See the License for the specific language governing permissions and
15  limitations under the License.
16-->
17<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
18<document>
19        <header>
20        <title>Hadoop Archives</title>
21        </header>
22        <body>
23        <section>
24        <title> ä»€ä¹ˆæ˜¯Hadoop archives? </title>
25        <p>
26        Hadoop archives是特殊的档案栌匏。䞀䞪Hadoop archive对应䞀䞪文件系统目圕。
27        Hadoop archive的扩展名是*.har。Hadoop archive包含元数据圢匏是_index和_masterindx和数据part-*文件。_index文件包含了档案䞭的文件的文件名和䜍眮信息。
28        </p>
29        </section>
30        <section>
31        <title> åŠ‚䜕创建archive? </title>
32        <p>
33        <code>甚法: hadoop archive -archiveName name &lt;src&gt;* &lt;dest&gt;</code>
34        </p>
35        <p>
36        由-archiveName选项指定䜠芁创建的archive的名字。比劂foo.har。archive的名字的扩展名应该是*.har。蟓入是文件系统的路埄名路埄名的栌匏和平时的衚蟟方匏䞀样。创建的archive䌚保存到目标目圕䞋。泚意创建archives是䞀䞪Map/Reduce job。䜠应该圚map reduce集矀䞊运行这䞪呜什。䞋面是䞀䞪䟋子
37        </p>
38        <p>
39        <code>hadoop archive -archiveName foo.har /user/hadoop/dir1 /user/hadoop/dir2 /user/zoo/</code>
40        </p><p>
41        圚䞊面的䟋子䞭
42        /user/hadoop/dir1 和 /user/hadoop/dir2 䌚被園档到这䞪文件系统目圕䞋
43        -- /user/zoo/foo.har。圓创建archive时源文件䞍䌚被曎改或删陀。
44        </p>
45        </section>
46        <section>
47        <title> åŠ‚䜕查看archives䞭的文件? </title>
48        <p>
49        archive䜜䞺文件系统层暎露给倖界。所以所有的fs shell呜什郜胜圚archive䞊运行䜆是芁䜿甚䞍同的URI。
50        及倖archive是䞍可改变的。所以重呜名删陀和创建郜䌚返回错误。Hadoop Archives 的URI是
51        </p><p><code>har://scheme-hostname:port/archivepath/fileinarchive</code></p><p>
52        劂果没提䟛scheme-hostname它䌚䜿甚默讀的文件系统。这种情况䞋URI是这种圢匏
53        </p><p><code>
54        har:///archivepath/fileinarchive</code></p>
55        <p>
56        这是䞀䞪archive的䟋子。archive的蟓入是/dir。这䞪dir目圕包含文件fileafileb。
57        把/dir園档到/user/hadoop/foo.bar的呜什是
58        </p>
59        <p><code>hadoop archive -archiveName foo.har /dir /user/hadoop</code>
60        </p><p>
61        获埗创建的archive䞭的文件列衚䜿甚呜什
62        </p>
63        <p><code>hadoop dfs -lsr har:///user/hadoop/foo.har</code></p>
64        <p>查看archive侭的filea文件的呜什-
65        </p><p><code>hadoop dfs -cat har:///user/hadoop/foo.har/dir/filea</code></p>
66        </section>
67        </body>
68</document>
Note: See TracBrowser for help on using the repository browser.