Configuring BigInsights 4.x to use Isilon OneFS 7.x or 8.x remote storage


In a standard Hadoop / Spark cluster deployment

A cluster contains Management nodes and Data nodes (Edge nodes are optional).

a) Management node(s) contain Hadoop / Spark services that use CPU / Memory resources and normally do not require large amounts of data storage. They execute CPU intensive tasks with data stored in memory. Only small logs are stored. Requests for information is provided over the network from memory.

Examples – Name node, Sec Name Node, HBase Master, Zookeepers, Big SQL Head node (DB2 DPF Coordinator node).

b) Data node(s) contain Hadoop / Spark services that use CPU / Memory and large amounts of local Data storage.

Data nodes contain large amounts of high speed local storage accessed by Map tasks, Reduce tasks, HBase Region services and Spark tasks that execute on Data nodes. CPU and Memory within Data nodes is taylored to the expected workload. The IO subsystem is designed for low IO latency, high IO bandwidth, at a low cost per GB of storage. Data protection is provided by data replication within the cluster. Examples – Map tasks, Reduce tasks, Spark tasks, HDFS, HBase Region Servers

When using Isilon remote storage in a cluster deployment

A cluster contains Management nodes, Compute nodes and Data nodes (with optional Edge nodes).

a) Management node(s) contain Hadoop / Spark services that use CPU / Memory resources and normally do not require large amount of data storage. Examples – HBase Master, Zookeepers, Big SQL Head node (DB2 DPF Coordinator node).

b) Compute node(s) contain Hadoop / Spark services that use CPU / Memory resources and remotely access large amount of data storage. Examples – Map tasks, Reduce tasks, Spark tasks, HBase Region Servers tasks. Compute nodes may not contain local storage. NFS share creation maybe necessary to support storage space for services that outside of HDFS. To setup working NFS shares requires the users / groups (UID/GID) to match between the cluster and Isilon.

c) Isilon Data node(s) live within the Isilon Storage Cluster and are accessed remotely by Hadoop / Spark Cluster, Management & Compute node(s). Isilon Data Nodes provide Name node, Sec Name node, HDFS and SmartConnect DNS services for Hadoop / Spark. All other Hadoop & Spark services are expected to be deployed on Management or Computer nodes. Isilon also provides NFS and SMB remote file shares that can be exported and mounted by remote clients.

The following documents provide information on installation and configuration:

1) Install and Configure BI with Isilon OneFS Storage

2) EMC Isilon Starter Kit

3) Sample admin scripts

Leave a comment