Hbase Shell 常用命令

2018-07-22 2024-12-31

bigdata

14 minutes read (About 2044 words)

HBase是一个分布式的、面向列的开源数据库

Apache HBase是一种Key/Value系统，它运行在HDFS之上。和Hive不一样，Hbase的能够在它的数据库上实时运行，而不是运行MapReduce任务。Hbase被分区为表格，表格又被进一步分割为列簇。列簇必须使用schema定义，列簇将某一类型列集合起来（列不要求schema定义），每一个 key/value对在Hbase中被定义为一个cell，每一个key由row-key，列簇、列和时间戳。在Hbase中，行是key/value映射的集合，这个映射通过row-key来唯一标识。Hbase利用Hadoop的基础设施，可以利用通用的设备进行水平的扩展。

Hive是一种类SQL的引擎，并且运行MapReduce任务。Hive适合用来对一段时间内的数据进行分析查询
Hbase是一种在Hadoop之上的NoSQL 的Key/vale数据库。Hbase非常适合用来进行大数据的实时查询

HBase shell是HBase的一套命令行工具，类似传统数据中的sql概念，可以使用shell命令来查询HBase中数据的详细情况。安装完HBase之后，如果配置了HBase的环境变量，只要在shell中执行hbase shell就可以进入命令行界面

namespace

hbase(main):002:0> create_namespace 'ark'
0 row(s) in 0.6910 seconds

hbase(main):003:0> describe_namespace 'ark'
DESCRIPTION
{NAME => 'ark'}
1 row(s) in 0.0330 seconds

hbase(main):004:0> list_namespace
NAMESPACE
ark
default
hbase
3 row(s) in 0.0410 seconds

hbase(main):006:0> create 'ark:users','info','roles'
0 row(s) in 4.5260 seconds

=> Hbase::Table - ark:users

hbase(main):007:0> list_namespace_tables 'ark'
TABLE
users
1 row(s) in 0.0330 seconds

##DDL

create命令了。它后面的第一个参数是表名，然后是一系列列簇的列表。每个列簇中可以独立指定它使用的版本数，数据有效保存时间（TTL），是否开启块缓存等信息

1	create 'ark:t1', {NAME =>'f1',VERSIONS =>1,BLOCKCACHE => true},'f2'

表也可以在创建时指定它预分割(pre-splitting)的region数和split方法。在表初始建立时，HBase只分配给这个表一个region。这就意味着当我们访问这个表数据时，我们只会访问一个region server，这样就不能充分利用集群资源。HBase提供了一个工具来管理表的region数，即org.apache.hadoop.hbase.util.RegionSplitter和HBase shell中create中的split的配置项。例如：

hbase(main):013:0> exists 'ark:t2'
Table ark:t2 does not exist
0 row(s) in 0.0320 seconds

hbase(main):014:0> create 't2', 'f1', {NUMREGIONS => 3, SPLITALGO => 'HexStringSplit'}
0 row(s) in 4.5200 seconds

通过enable和disable来启用/禁用这个表,相应的可以通过is_enabled和is_disabled来检查表是否被禁用。

hbase(main):017:0> is_enabled 'ark:t2'
true
0 row(s) in 0.0190 seconds

hbase(main):018:0> disable 'ark:t2'
0 row(s) in 4.5480 seconds


hbase(main):023:0> alter 'ark:t1' ,{NAME =>'f1',VERSIONS=>6}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 3.8640 seconds

使用alter来改变表的属性，比如改变列簇的属性, 这涉及将信息更新到所有的region

hbase(main):024:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0260 seconds

另外一个非常常用的操作是添加和删除列簇

hbase(main):025:0> alter 'ark:t1','f3'
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 3.9290 seconds

hbase(main):026:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
3 row(s) in 0.0200 seconds

hbase(main):027:0> alter 'ark:t1' ,{NAME=>'f3',METHOD=>'delete'}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 4.2140 seconds

hbase(main):029:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0230 seconds

删除表需要先将表disable。

hbase(main):032:0> disable 'ark:t1'
0 row(s) in 2.3810 seconds

hbase(main):033:0> drop 'ark:t1'
0 row(s) in 2.3310 seconds

put与get

在HBase shell中，我们可以通过put命令来插入数据。例如我们新创建一个表，它拥有id、address和info三个列簇，并插入一些数据。列簇下的列不需要提前创建，在需要时通过
:来指定即可

create 'member','id','address','info'

put 'member', 'debugo','id','11'
put 'member', 'debugo','info:age','27'
put 'member', 'debugo','info:birthday','1987-04-04'
put 'member', 'debugo','info:industry', 'it'
put 'member', 'debugo','address:city','beijing'
put 'member', 'debugo','address:country','china'
put 'member', 'Sariel', 'id', '21'
put 'member', 'Sariel','info:age', '26'
put 'member', 'Sariel','info:birthday', '1988-05-09	'
put 'member', 'Sariel','info:industry', 'it'
put 'member', 'Sariel','address:city', 'beijing'
put 'member', 'Sariel','address:country', 'china'
put 'member', 'Elvis', 'id', '22'
put 'member', 'Elvis','info:age', '26'
put 'member', 'Elvis','info:birthday', '1988-09-14 '
put 'member', 'Elvis','info:industry', 'it'
put 'member', 'Elvis','address:city', 'beijing'
put 'member', 'Elvis','address:country', 'china'

查询

1.查询表中有多少行：count

1	count 'member'

2.get操作

1）获取一个id的所有数据

get 'member','Sariel'
COLUMN                                CELL
 address:city                         timestamp=1532076882889, value=beijing
 address:country                      timestamp=1532076882925, value=china
 id:                                  timestamp=1532076882756, value=21
 info:age                             timestamp=1532076882780, value=26
 info:birthday                        timestamp=1532076882808, value=1988-05-09
 info:industry                        timestamp=1532076882833, value=it
6 row(s) in 0.0570 seconds

2）获得一个id，一个列簇（一个列）中的所有数据

hbase(main):001:0> get 'member' ,'Sariel','info'
COLUMN                                CELL
 info:age                             timestamp=1532076882780, value=26
 info:birthday                        timestamp=1532076882808, value=1988-05-09
 info:industry                        timestamp=1532076882833, value=it
3 row(s) in 0.4630 seconds

3.scan操作

1)查询整表数据

hbase(main):002:0> scan 'member'
ROW                                   COLUMN+CELL
 Elvis                                column=address:city, timestamp=1532076883093, value=beijing
 Elvis                                column=address:country, timestamp=1532076884185, value=china
 Elvis                                column=id:, timestamp=1532076882949, value=22
 Elvis                                column=info:age, timestamp=1532076882972, value=26
 Elvis                                column=info:birthday, timestamp=1532076883041, value=1988-09-14
 Elvis                                column=info:industry, timestamp=1532076883066, value=it
 Sariel                               column=address:city, timestamp=1532076882889, value=beijing
 Sariel                               column=address:country, timestamp=1532076882925, value=china
 Sariel                               column=id:, timestamp=1532076882756, value=21
 Sariel                               column=info:age, timestamp=1532076882780, value=26
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
 Sariel                               column=info:industry, timestamp=1532076882833, value=it
 debugo                               column=address:city, timestamp=1532076882699, value=beijing
 debugo                               column=address:country, timestamp=1532076882725, value=china
 debugo                               column=id:, timestamp=1532076882582, value=11
 debugo                               column=info:age, timestamp=1532076882627, value=27
 debugo                               column=info:birthday, timestamp=1532076882650, value=1987-04-04
 debugo                               column=info:industry, timestamp=1532076882676, value=it
3 row(s) in 0.0870 seconds

2)扫描整个列簇

hbase(main):003:0> scan 'member',{COLUMN=>'info'}
ROW                                   COLUMN+CELL
 Elvis                                column=info:age, timestamp=1532076882972, value=26
 Elvis                                column=info:birthday, timestamp=1532076883041, value=1988-09-14
 Elvis                                column=info:industry, timestamp=1532076883066, value=it
 Sariel                               column=info:age, timestamp=1532076882780, value=26
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
 Sariel                               column=info:industry, timestamp=1532076882833, value=it
 debugo                               column=info:age, timestamp=1532076882627, value=27
 debugo                               column=info:birthday, timestamp=1532076882650, value=1987-04-04
 debugo                               column=info:industry, timestamp=1532076882676, value=it
3 row(s) in 0.0600 seconds

3)指定扫描其中的某个列

hbase(main):006:0* scan 'member', {COLUMNS=> 'info:birthday'}
ROW                                   COLUMN+CELL
 Elvis                                column=info:birthday, timestamp=1532076883041, value=1988-09-14
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
 debugo                               column=info:birthday, timestamp=1532076882650, value=1987-04-04
3 row(s) in 0.0280 seconds

4）除了列（COLUMNS）修饰词外，HBase还支持Limit（限制查询结果行数），STARTROW（ROWKEY起始行。会先根据这个key定位到region，再向后扫描）、STOPROW(结束行)、TIMERANGE（限定时间戳范围）、VERSIONS（版本数）、和FILTER（按条件过滤行）等。比如我们从Sariel这个rowkey开始，找下一个行的最新版本

hbase(main):007:0> scan 'member', { STARTROW => 'Sariel', LIMIT=>1, VERSIONS=>1}
ROW                                   COLUMN+CELL
 Sariel                               column=address:city, timestamp=1532076882889, value=beijing
 Sariel                               column=address:country, timestamp=1532076882925, value=china
 Sariel                               column=id:, timestamp=1532076882756, value=21
 Sariel                               column=info:age, timestamp=1532076882780, value=26
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
 Sariel                               column=info:industry, timestamp=1532076882833, value=it
1 row(s) in 0.0360 seconds

5）Filter是一个非常强大的修饰词，可以设定一系列条件来进行过滤。比如我们要限制某个列的值等于26

#值包含
hbase(main):010:0> scan 'member', FILTER=>"ValueFilter(=,'binary:26')"
ROW                                   COLUMN+CELL
 Elvis                                column=info:age, timestamp=1532076882972, value=26
 Sariel                               column=info:age, timestamp=1532076882780, value=26
2 row(s) in 0.0740 seconds

#包含字符串
hbase(main):013:0> scan 'member', FILTER=>"ValueFilter(=,'substring:6')"
ROW                                   COLUMN+CELL
 Elvis                                column=info:age, timestamp=1532076882972, value=26
 Sariel                               column=info:age, timestamp=1532076882780, value=26
2 row(s) in 0.0280 seconds

#列名中的前缀为birth的
hbase(main):016:0> scan 'member', FILTER=>"ColumnPrefixFilter('birth')"
ROW                                   COLUMN+CELL
 Elvis                                column=info:birthday, timestamp=1532076883041, value=1988-09-14
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
 debugo                               column=info:birthday, timestamp=1532076882650, value=1987-04-04
3 row(s) in 0.8280 seconds

#FILTER中支持多个过滤条件通过括号、AND和OR的条件组合
hbase(main):019:0> scan 'member',FILTER=>"ColumnPrefixFilter('birth') AND ValueFilter(=,'substring:1988')"
ROW                                   COLUMN+CELL
 Elvis                                column=info:birthday, timestamp=1532076883041, value=1988-09-14
 Sariel                               column=info:birthday, timestamp=1532076882808, value=1988-05-09
2 row(s) in 0.0950 seconds

scan ‘db_demobank626:dim_p’ ,FILTER=>”PrefixFilter(‘profile|xwhov’)”
-s 2000 -S 5000 -t 10 -s 2000 -S 5000 -t 10

本文标题：Hbase Shell 常用命令
本文作者：LoganShen
本文链接：https://blog.95id.com/hbase-shell.html
发布时间：2018-07-22
版权声明：本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明出处！

bigdata, hbase

Hbase Shell 常用命令

namespace

put与get

查询

Comments

Catalogue

Your browser is out-of-date!