Hadoop配置文件与HBase配置文件

本Hadoop与HBase集群有1台NameNode, 7台DataNode 1. /etc/hostname文件 NameNode: node1 DataNode 1: node2 DataNode 2: node3 ....... DataNode 7: node8 2. /etc/hosts文件 NameNode: 127.0.0.1localhost#127.0.1.1node1#-------edit by HY(2014-05-04)---

本Hadoop与HBase集群有1台NameNode, 7台DataNode

1. /etc/hostname文件

NameNode:

node1

DataNode 1:

node2

DataNode 2:

node3

.......

DataNode 7:

node8

2. /etc/hosts文件

NameNode:

127.0.0.1	localhost
#127.0.1.1	node1
#-------edit by HY(2014-05-04)--------
#127.0.1.1	node1
125.216.241.113 node1
125.216.241.112 node2
125.216.241.96 node3
125.216.241.111 node4
125.216.241.114 node5
125.216.241.115 node6
125.216.241.116 node7
125.216.241.117 node8
#-------end edit--------

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
DataNode 1:
127.0.0.1	localhost
#127.0.0.1	node2
#127.0.1.1	node2
#--------eidt by HY(2014-05-04)--------
125.216.241.113 node1
125.216.241.112 node2
125.216.241.96 node3
125.216.241.111 node4
125.216.241.114 node5
125.216.241.115 node6
125.216.241.116 node7
125.216.241.117 node8
#-------end eidt---------

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

其他的DataNode类似,只是注意要保持hostname与hosts中的域名要一样, 如果不一样, 在集群上跑任务时会出一些莫名奇妙的问题, 具体什么问题忘记了.

3. 在hadoop-env.sh中注释

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

增加

JAVA_HOME=/usr/lib/jvm/java-6-sun

4. core-site.xml







  
 fs.default.name  
  hdfs://node1:49000  
  
  
  hadoop.tmp.dir  
 /home/hadoop/newdata/hadoop-1.2.1/tmp  
 

io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec


io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec

   
     dfs.datanode.socket.write.timeout
     3000000
   
 
   
     dfs.socket.timeout
     3000000
   
5. hdfs-site.xml






  
dfs.name.dir  
/home/hadoop/newdata/hadoop-1.2.1/name1,/home/hadoop/newdata/hadoop-1.2.1/name2
数据元信息存储位置  
  
  
dfs.data.dir  
/home/hadoop/newdata/hadoop-1.2.1/data1,/home/hadoop/newdata/hadoop-1.2.1/data2  
数据块存储位置  
  
  
  dfs.replication  
    
  2  
  

6. mapred-site.xml







  
  mapred.job.tracker  
  node1:49001  
  
  
  mapred.local.dir  
 /home/hadoop/newdata/hadoop-1.2.1/tmp  


mapred.compress.map.output
true



mapred.map.output.compression.codec
com.hadoop.compression.lzo.LzoCodec


 

7. masters

node1

8. slaves

node2
node3
node4
node5
node6
node7
node8

9. 在hbase-env.sh

增加

JAVA_HOME=/usr/lib/jvm/java-6-sun

并启用export HBASE_MANAGES_ZK=true //为true表示使用自带的Zookeeper, 如果需要独立的Zookeeper,则设置为false, 并且安装Zookeeper

10. hbase-site.xml





    
        hbase.rootdir
        hdfs://node1:49000/hbase
        The directory shared by RegionServers.
    

    
        hbase.cluster.distributed
        true
        The mode the cluster will be in. Possible values are
            false: standalone and pseudo-distributed setups with managed Zookeeper
            true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
        
    

    
        hbase.master
        node1:60000
        
        
    

    
        hbase.tmp.dir
        /home/hadoop/newdata/hbase/tmp
        
            Temporary directory on the local filesystem.
            Change this setting to point to a location more permanent than '/tmp',
            the usual resolve for java.io.tmpdir,
            as the '/tmp' directory is cleared on machine restart.
            Default: ${java.io.tmpdir}/hbase-${user.name}
        
    

    
        hbase.zookeeper.quorum
        node2,node3,node4,node5,node6,node7,node8
        
            要单数台,Comma separated list of servers in the ZooKeeper ensemble (This config.
            should have been named hbase.zookeeper.ensemble).
            For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
            By default this is set to localhost for local and pseudo-distributed
            modes of operation.
            For a fully-distributed setup,
            this should be set to a full list of ZooKeeper ensemble servers.
            If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers
            which hbase will start/stop ZooKeeper on as part of cluster start/stop.
            Client-side, we will take this list of ensemble members and put it
            together with the hbase.zookeeper.clientPort config.
            and pass it into zookeeper constructor as the connectString parameter.
            Default: localhost
        
    

    
        hbase.zookeeper.property.dataDir
        /home/hadoop/newdata/zookeeper
        
            Property from ZooKeeper's config zoo.cfg.
            The directory where the snapshot is stored.
            Default: ${hbase.tmp.dir}/zookeeper
        
    

    
        
        
    

11. regionservers

node2
node3
node4
node5
node6
node7
node8 

每台机器配置都要一样