生产实习03-zookeeper与Hbase基础

提示

为什么跳到记录第七天了?因为前几天教的是JAVAWEB相关的知识，在技术栈有比较所以就不记录了，同时在上一周解决了我们项目的后端，详细见github: wrm244/bigdata_depression

zookeeper

Zookeeper是一个分布式协调服务，其主要作用是为分布式系统提供可靠的协调和一致性功能。总的来说，Zookeeper通过提供一致性和可靠的分布式协调服务，帮助解决了分布式系统中的各种问题，包括数据一致性、节点故障处理、分布式任务调度等。Zookeeper与HBase之间有密切的关系，可以说Zookeeper是HBase的基础设施之一

配置ZooKeeper

进入ZooKeeper解压后的目录，找到conf子目录。复制zoo_sample.cfg文件并将复制的文件重命名为zoo.cfg。打开zoo.cfg文件，根据需要进行适当的配置更改。主要的配置项包括：dataDir（ZooKeeper数据目录）、clientPort（客户端连接端口）等。可参考如下配置：

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/soft/zookeeper/zkData
dataLogDir=/opt/soft/zookeeper/zkLog
# the port at which the clients will connect
clientPort=2181
server.1=hadoop:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Hbase

HBase是一个开源的分布式列式数据库，主要用于存储和处理大规模结构化数据。HBase的主要作用是提供高可扩展性、高性能、弹性存储和实时查询能力，适用于处理大规模结构化数据的存储和分析需求。它在大数据领域广泛应用于日志分析、用户行为分析、在线交互式应用等场景。

配置HBASE

hbase-site.xmlxml
<configuration>
  <property>
  <name>hbase.rootdir</name>
  <value>hdfs://hadoop:9000/hbase</value>
  </property>
  <property>
   <name>hbase.zookeeper.quorum</name>
   <value>hadoop:2181</value>
  </property>
  
  <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
</configuration>

备注

hbase.rootdir：设置Hadoop的HDFS目录 hbase.zookeeper.quorum：设置ZooKeeper的地址 hbase.cluster.distributed：若要屏蔽HBase自带的ZooKeeper并使用外部的ZooKeeper实例。

conf/hbase-env.shsh
export JAVA_HOME=/usr/local/jdk

启动停止HBase服务

启动HBase服务，打开终端或命令提示符，进入HBase的安装目录。使用以下命令启动HBase服务。

bin/start-hbase.sh

使用以下命令停止HBase服务：

bin/stop-hbase.sh

使用HBase Shell 打开终端并进入HBase shell

hbase shell

HBase案例

使用HBase提供的importtsv工具来导入CSV文件。这个工具可以将CSV文件转换为HBase的KeyValue格式并导入到指定的表中。

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,info:dt,info:AverageTemperature,info:uncertainty,info:state,info:country stateTemperatures hdfs://node1:8020/weather/stateTemperatures.csv

扫描查看导入进去的数据

[hadoop@node1 ~]$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.5.5, r7ebd4381261fefd78fc2acf258a95184f4147cee, Thu Jun  1 17:42:49 PDT 2023
Took 0.0021 seconds                                                                                                              
hbase:001:0> scan 'stateTemperatures',{LIMIT => 2}
ROW                               COLUMN+CELL                                                                                    
 1                                column=info:AverageTemperature, timestamp=2023-07-04T15:42:30.348, value=25.544                
 1                                column=info:country, timestamp=2023-07-04T15:42:30.348, value=Brazil                           
 1                                column=info:dt, timestamp=2023-07-04T15:42:30.348, value=1855-05-01                            
 1                                column=info:state, timestamp=2023-07-04T15:42:30.348, value=Acre                               
 1                                column=info:uncertainty, timestamp=2023-07-04T15:42:30.348, value=1.171                        
 10                               column=info:AverageTemperature, timestamp=2023-07-04T15:42:30.348, value=24.658                
 10                               column=info:country, timestamp=2023-07-04T15:42:30.348, value=Brazil                           
 10                               column=info:dt, timestamp=2023-07-04T15:42:30.348, value=1856-02-01                            
 10                               column=info:state, timestamp=2023-07-04T15:42:30.348, value=Acre                               
 10                               column=info:uncertainty, timestamp=2023-07-04T15:42:30.348, value=1.147                        
2 row(s)
Took 0.5937 seconds                                                                                                              
hbase:002:0> 

zookeeper​

配置ZooKeeper​

Hbase​

配置HBASE​

启动停止HBase服务​

HBase案例​