2012 Agenda Abstracts

 

 

BIG DATA

Efficient Archiving and Fetching of 'Big Data' Files

Uttam Kaushik, Manager, Engineering, EMC DiskXtender product suite  

Abstract

Data is growing at enormous pace. Social application adoption has generated huge data. People share photos, videos and discuss on various topics. Marketing companies want these data to extract information about customer choice. Similarly, research institutions, universities generate lot of research data. ‘Big Data’ has become ubiquitous. Other interesting case of big data is when an application uses extremely large files – possible uses include databases, high definition video, scientific applications, etc.    Archiving ‘Big Data’ files as well as large blobs of data of the order of 500 GB is very challenging, for present day data center infrastructure. The move towards Cloud based platforms and services give a different angle to this altogether. These require larger network bandwidth and intelligently managed memory requirements to transfer and archive files. The file processing requires dynamic handling and processes, which take into account current available bandwidth and system resources.        Fetching big files poses a different and critical challenge than migrating them. Many applications are time sensitive; they may not be able to wait till the entire file is fetched back. The file streaming applications are such an example, where timing and buffering is critical.    This paper outlines the challenges in archiving and fetching of such big file and proposes solutions to effectively handle such challenges.

Learning Objective

  • Optimize Fetching of Large Files

 


Object Storage - Key to Big Data Infrastructure

Anil Vasudeva, President & Chief Analyst, IMEX Research

Abstract

In the era of Big Data managing disparate storage solutions for structured data (databases, log files, text, value based data) and unstructured data (audio, documents, emails, images, video) has become challenging for IT Organizations. A key feature of managing big data is through using Object Storage’s rich metadata system, which allows full search functionality. Based on the open source platform OpenStack, Object Storage allows users to easily store, search and retrieve data across the Internet. This is object-based storage’s strengths in automating and streamlining data storage in cloud environments while store unstructured, semi-structured, and structured data on the same storage system

Learning Objectives

  • Learn how Object storage plays a fundamental role in adding metadat for handling  predictive analytics in massive amounts of structured and unstructured data
  • Learn how growth of unstructured causes issues taht are being addressed by Object Storage

 


Big Data and the Evolution of Tape Technologies

Kevin Dudak, Senior Product Manager, Spectra Logic

Abstract

The explosion of Big Data and the need to store critical digital content at the petabyte level and beyond are forcing organizations to look beyond traditional storage solutions. As tape capacities grow, new tape technologies are continuing to emerge to help solve today’s growing data needs. Tape is quickly becoming the de facto standard for Big Data storage solutions, and new tape drive densities are pushing the archive in new directions. This session will provide an overview of emerging tape technologies that help solve today’s big and growing data needs, including LTFS, LTO6 and TS1140, and will discuss how these technologies fit into the overall storage picture.

Learning Objectives

  • How Big Data is driving the need for more advanced storage solutions
  • The advantages of tape for data archiving
  • How today’s tape technologies fit into the overall storage picture

 


Accelerating Hadoop with Data Optimization

Hank Cohen, Director of Product Management, Altior Inc.

Abstract

We will present a technique to accelerate Hadoop and other Big Data applications by optimizing data in the local file system.  Hadoop is often I/O bound so increasing I/O rates will also increase map/reduce execution time.  A transparent Compression/Decompression file system can accelerate Hadoop without any change to workflow or applications.

Learning Objectives

  • Understand why Hadoop is frequently I/O bound
  • Understand how data optimization accelerates I/O and therefore addresses the I/O bound problem.
  • Compare and contrast native software compression and hardware accelerated compression in Hadoop environmnents.
  • Understand the costs and benefits of various approches to compression in Hadoop

BIRDS OF A FEATHER

NVM Programming TWG: 3 Months Old!

Paul von Behren

Abstract

The NVM Programming TWG was formed three months ago to accelerate availability of software that enables optimal use of Non-Volatile Memory. This BOF talks about the TWG’s area of focus, current work, and future plans. The target audience is driver, OS, and application developers planning to incorporate NVM capabilities. This BOF is not limited to TWG members; anyone may attend.

 


Solid State Storage TWG BoF

TBD

Abstract

Pending

 


Developing with Red Hat Storage

John Mark Walker

Abstract

This BoF invites attendees to learn how to build new filesystems with Red Hat Storage, including how to extend the platform via its translator APIs. Red Hat Storage, with its GlusterFS foundation, is implemented as a POSIX-like scale-out NAS filesystem, but developers can exploit its hacker friendly confines to build (and test) new filesystems on its scalable architecture. If tinkering with and creating new filesystems is a fun activity for you, you'll enjoy this discussion.

 


Fast Storage Access with InfiniBand and RDMA

Erin Filliater

Abstract

Today’s data centers are dealing with an unprecedented amount of data, and that data continues to grow each day. With the increased demand for data, storage access becomes critical to overall service delivery and data center performance. Join Mellanox Technologies and DataDirect Networks to talk about today’s data delivery challenges, the solutions provided by InfiniBand connected storage, and the future of data center storage interconnects.

 


Cloud Storage Implementations

Mark Carlson, Cloud Storage TWG

Agenda:

Implementing the SNIA CDMI Reference Implementation - Mark Carlson
Implementing the NetApp StorageGRID 9.0 - David Slik
Implementing the CDMI interface for OpenStack Swift - Tong Li (tentative)

 


OpenAFS

Jeffrey Altman, OpenAFS Gatekeeper

Abstract

OpenAFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for file sharing, providing location independence, scalability, security, and transparent migration capabilities for data.

IBM branched the source of the AFS product, and made a copy of the source available for community development and maintenance. They called the release OpenAFS.

Jeffrey Altman, OpenAFS Gatekeeper, will lead the BOF providing updates on recent improvements to the OpenAFS master branch and answer questions from attendees.

 

BLOCK PROTOCOL

SAS and SCSI Technology Advancements

Marty Czekalski, President - SCSI Trade Association, Interface and Emerging Architecture Program Manager
 

Abstract

SCSI has been the standard for enterprise storage for many years.  Reliability, performance and compatibility are at the heart of SCSI and SAS, and why OEMs and IT managers trust it with enterprise data.  The past year has brought advancements and innovations for SAS and SCSI, including:  12Gb/s SAS, SCSI Express (SCSI Over PCIe), enhanced commands for SSDs and extended copy.  This talk will  also include a SAS standards and roadmap update.

Learning Objectives

  • Attendees will learn the status of 12Gb/s SAS in the standards, testing & plugfest plans and the expected launch schedule for 12Gb/s SAS products
  • SCSI Express is a new project running SCSI commands over PCI Express, and attendees will learn what it is, what it isn’t and how it will improve the use of flash storage in the enterprise
  • Attendees will learn how about SCSI commands being created for use with SSDs to better manage and improve usability in servers and storage systems

 


iSCSI: Backing Blocks with Files

Paul Forgey, Principal Software Engineer, EMC Isilon Storage Division
 

Abstract

Traditionally, block based storage is a layer below a file system right before physical storage. Isilon’s OneFS has implemented iSCSI protocol support on top of a file system which presents interesting performance challenges. This talk addresses the challenges implementing a block interface to a file-based back end with redundant data guarantees.

Learning Objectives

  • Intentions of a reliable file system
     Application accepted levels of atomicity
     Caching, write guarantees and read-modify-write costs
     Implementation of iSCSI backed by OneFS files

Fibre Channel over Ethernet (FCoE)

John Hufferd, Owner, Hufferd Enterprises

Abstract

The Fibre Channel (T11.3) standards committee developed a Standard called Fibre Channel over Ethernet (FCoE)   The FCoE standard specifies the encapsulation of Fibre Channel frames into Ethernet Frames and the amalgamation of these technologies into a network fabric that can support Fibre Channel protocols and other protocols such as TCP/IP, UDP/IP etc.  The tutorial will show the Fundamentals of these FCoE concepts and describe how they might be exploited in a Data Center environment and its position with regards to FC and iSCSI.  The requirements on the Ethernet Fabric for support of FC protocols will also be shown.

Learning Objectives

  • The audience will gain a general understanding of the concept of using a Data Center type Ethernet for the transmission of Fibre Channel protocols
  • The audience will gain an understanding of the benefits of converged I/O and how a Fibre Channel protocol can share an Ethernet network with other Ethernet based protocols.
  • The audience will gain an understanding of potential business value and configurations that will be appropriate for gaining maximum value from this converged I/O capability.
     

VN2VN: A new framework for L2 DAS

Prafulla Deuskar, Storage Networking Architect, Intel
Mark Wunderlich, Storage Technologist, Intel

Abstract

FC-BB-6 introduces support for new topologies using FCoE namely Pt-Pt and Multi-point with the help of new port type namely VN2VN port.    This presentation provides usage models for these new topologies and discusses how they could be supported in Open-FCoE.    Further more we discuss Open-FCoE target stack architecture in Linux and how VN2VN is supported in it. 

Learning Objectives

  • FC-BB-6 - new topologies for FCoE
  • Usage models for Pt-Pt and Multi-point
  • Open-FCoE target stack architecture

CLOUD

Cloud File System and Cloud Data Management Interface (CDMI)

Ajit Nipunge, Solutions Architect, Calsoft Inc.
Parag Kulkarni, VP Engineering, Calsoft Inc.

Abstract

Seamless extension of NAS to Cloud Storage using Cloud File System    Today’s Network Attached Storage (NAS) store data on local disks and/or SAN disks. Most of the enterprises have sufficient file storage capacity to run their day-to-day operations provided some older data is moved to secondary storage. But since most of the data resides on the primary storage (or secondary storage within the enterprise boundaries) it becomes necessary to extend storage capacity for NAS    With Cloud storage expanding and becoming more secure, accessible, easy to use and cost effective, it can be a considered as secondary storage for enterprise NAS – Hierarchical Storage Management. We can even use cloud storage as primary storage by using the enterprise storage devices for caching to improve cloud data access throughput.     Adding CDMI based interfaces to cloud file system enables us to integrate with any cloud storage provider and store file based data to cloud storage.  The cloud file system presented and implemented by Calsoft integrates with many cloud storage providers using Cloud Data Management Interface (CDMI). This helps enterprises store file based data to cloud storage and provides throughput similar to local NAS by using efficient caching techniques.

Learning Objectives

  • To address the challenges faced by enterprises to store ever growing data and optimally manage storage capacity.
  • Hierarchal Storage Management across enterprise storage and cloud storage
  • Optimizing the storage capacity between on-premise and cloud storage pools.
  • Easy migration between cloud storage platforms
    CDMI – move to build an open standards for storing data in the Cloud.

Open Source Droplet Project Update with S3 and SNIA CDMI Support

Giorgio Regni, CTO Scality
Philippe Nicolas, Director, Product Strategy, Scality

Abstract

Droplet, started 2 years ago, is a cross cloud storage compatible client library under BSD license model. Its belongs to the Scality Open Source Program and supports SNIA CDMI and Amazon S3 protocols. Various extensions and tools have been developped by the community such as  a cloud migration tool, an incremental backup agent with data deduplication, a file system emulation at a command line interface.

Learning Objectives

  • Access CDMI compatible storage
  • Develop cross cloud applications
  • Migrate and Exchange data between clouds
  • Integrate Droplet in existing applications

Overcoming Challenges & Best Practices for Interoperability Testing of CDMI

Nishi Gupta, Senior Storage Architect, Tata Consultancy Services Ltd.
Hansi Agarwal, Tata Consultancy Services Ltd.

Abstract

Over the past years, interoperability has become more and more  of a necessity in standardization and Industry demands delivery of products/services which are interoperable. SNIA addresses lack of interoperability issue of Cloud Storage by Cloud Data Management Interface specification. It tags users data with special metadata (data system metadata) that tells the cloud storage provider what data services (backup, archive, encryption etc) to provide that data. It helps in moving users data from cloud vendor to cloud vendor without the pain of recoding of different interfaces.  There is a growing trend where organizations are adopting CDMI in their products and hence interoperability becomes extremely important. TCS is working on ` CDMI Automated Test Suite`, which focuses on testing compliance to CDMI specifications. In this proposal we will share our observations/challenges during the development of test suite and challenges while testing products for CDMI compliance. This will benefit that companies to adopt best practices while developing CDMI compliant products.

Learning Objectives

  • Understanding of Cloud data management Interface
  • Understanding of How to develop automated test suite for Standards

CDMI Extensions

David Slik, Technical Director, Object Storage, NetApp

Abstract

As part of the CDMI 1.1 standardization effort, multiple extensions to the CDMI standard have been proposed by participating vendors and end users. This session provides an technical overview of extensions currently under public review, and will demonstrate select extensions in working implementations. Extensions include versioning, job management, partial server-side copy, and can be reviewed at http://snia.org/tech_activities/publicreview/cdmi

Learning Objectives

  • Learn about the CDMI extensions process within SNIA
  • Learn about extensions currently proposed for CDMI 1.1
  • See a demonstration of select extensions

CDMI Federations Year 3

David Slik, Technical Director, Object Storage, NetAoo

Abstract

In addition to standardizing client-to-cloud interactions, the SNIA Cloud Data Management Interface (CDMI) standard enables a powerful set of cloud-to-cloud interactions. Federations, being the mechanism by which CDMI clouds establish cloud-to-cloud relationships, provide a powerful multi-vendor and interoperable approach to peering, merging, splitting, migrating, delegating, sharing and exchange of stored objects.    In last two SDC presentations, the basics of CDMI federation were discussed. For year three, we will review what is involved in making federations interoperable across multiple vendors, demonstrate two common use cases enabled by federation, and discuss the ongoing work within the SNIA Cloud Storage Technical Working Group to add federation as a formal part of CDMI 1.1, which is currently under development

Learning Objectives

  • Review CDMI Federations, and how they help cloud users and providers
  • Review common use cases of CDMI Federations
  • Learn about updates to CDMI federation and ongoing standardized efforts
  • See a multi-system demonstration of CDMI Federations in action

CDMI Support for Object Storage in Cloud Storage

Padmavathy Madhusudhanan, Technical Manager, Wipro Technologies

Abstract

Object storage, a leading emerging technology is mainly intended to handle exponential growth of unstructured data. Unlike traditional storage of files in NAS or blocks in SAN, it uses data objects. Each object is assigned a unique object ID and each object contains its own meta data along with actual data thereby removing centralized indexing. Thus it enables massive scalability, geographic independence under reasonable costs. Pitfalls in IOPs Performance, latency and proprietary interface makes object storage more suitable for archiving & backup operations instead of primary storage.  Hence it has become more ideal for cloud storage.  CDMI, industry standard is meant for complete life cycle management of objects in cloud. How it can be leveraged/enhanced to support Object storage will be addressed in detail in this paper.

Learning Objectives

  • Significance of Object based storage;
  • Why it is considered more suitable for Cloud Storage
  • How CDMI can accelerate this data against vendor specific clouds while meeting business policies.

Openstack Object Storage Overview

John Dickinson, Project Technical Lead Developer, Openstack/Rackspace

Abstract

Openstack Object Storage (called swift) is an open-source, distributed, eventually-consistent object storage system. It was written by developers at Rackspace to power Rackspace's Cloud Files product and then contributed to the Openstack project in 2010. Swift is designed to cheaply and reliably store massive amounts of data. In this talk, I will discuss where object storage is useful, how swift works, and how deployers and developers can use swift effectively.

Learning Objectives

  • Where does object storage fit in the storage landscape?
  • Why openness matters
  • Technical overview of swift's architecture
  • Writing clients to take advantage of swift
  • Deployment strategies for swift

A Case Study of an Object Storage Grid Using Tahoe - LAFS

Phillip Clark, Principal, ECC Data Corporation

Abstract

An overview of an example 5-node storage grid is given, along with an actual demonstration. The demonstration is done using VirtualBox and Linux guests running the Tahoe-LAFS open-source software. The demonstration highlights anchor concepts such as erasure coding, inherent security at any individual node, data availability and overhead costs, and some differences in client upload versus download. Throughout the presentation, contrasts are drawn between storage grids and traditional SAN/NAS technologies. Finally, needs and future directions are discussed including how grid storage relates to the cloud.

Learning Objectives

  • To understand an object storage grid through demonstration
  • Grid storage is not SAN/NAS - good uses, and not-so-good uses, of object storage technologies
  • Availability versus overhead tradeoffs in the grid
  • Roles of object storage in consumer, SMB, and enterprise spaces

Data Integrity in the Cloud

Christopher Hellwig, Principal Storage Engineer, Nebula

Abstract

Getting data to stable storage is surprisingly hard, and developers of the various levels of the I/O stack have struggled defining semantics for users of the I/O subsystems as well as guaranteeing them.  The mismatch of semantics at different levels already causes headaches for traditional enterprise systems, but gets even worse for virtualized environments and distributed cloud systems.  This talk pinpoints known errors and caveats in the implementation of complex I/O subsystems implementing block storage and filesystem semantics, and also shows how interconnected the two are,  especially in distributed environments.

Learning Objectives

  • Understanding of data integrity concerns
  • Basic understanding of I/O stacks in cloud environments 

Optimizing Sequence Alignment in Cloud Using Hadoop and MPP Database

Senthilkumar Vijayakumar, IT Analyst, Tata Consultancy Services Ltd

Abstract

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. This information can effectively be used for medical and biological research only if one can extract functional insight from it. To obtain functional insight the factors to be considered while aligning sequences are: optimized querying of sequences, high speed matching and accuracy of alignment. The FAST-All (FASTA) for both proteins and nucleotides program considers all these factors and follows a largely heuristic method, which contributes to the high speed of its execution. The program initially observes the pattern of word hits, word-to-word matches of a given length, and marks potential matches rather than performing a more time-consuming, optimized search using a Smith-Waterman type of algorithm.    This proposal is targeted at an optimized approach to sequence alignment using FASTA algorithm, which incorporates high speed word-to-word matching. In the current scenario where data growth is in petabytes a day and processing requires state of the art technologies, Greenplum Massively Parallel Processing (MPP) database and Hadoop are emerging parallel technologies which form the backbone of this proposal. The complex nature of the algorithm, coupled with data and computational parallelism of Hadoop grid and massively parallel processing database for querying from big datasets containing petabytes of sequences, improves the accuracy, speed of sequence alignment and optimizes querying from big datasets.    Bioinformatics labs and centers across the globe today upload enormous amount of data and sequences in a central location for the scientific analysis. The transfer of such large datasets can also be simplified with Cloud approaches. So, Cloud Computing technology forms a strong candidate as the end point of such sequences and data gathered from various sources like medical research centers, scientists and biomedical labs around the globe. A plan for the final “publicly consumable” form of the program is to make it web-based and running on the Cloud.

Learning Objectives

  • In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. This information can effectively be used for medical and biological research only if one can extract functional insight from it.
  • Learn about various problems and challenges faced by medical and biological research organisations in the area of bioinformatics sequence alignments.
  • To learn how Cloud Computing have an attractive solution towards providing massively scalable computational power and green credentials too.
  • To learn how complex nature of the FASTA algorithm is solved with the help of Hadoop grids which has the power of data and computational parallelism and MPP database for querying from big datasets containing large sequences, improves performance and optimizes querying from big datasets.
  • A massively parallel processing database in the form of Greenplum, coupled with the computational brilliance of Hadoop, built on the foundation of Cloud and virtualization with an optimized FASTA algorithm is ‘‘the next generation solution”.

NoSQL in the Clouds with Windows Azure Table

Jai Haridas, Principal Development Manager, Microsoft

Abstract

Windows Azure Table is an easy to use No SQL database that auto scales to meet your throughput needs. It supports a schema-less store that enables you to shape the data for efficient storing and retrieval functions.  In this talk we will discuss patterns and the best practices that will enable you to build massively scalable and durable  applications without having to deal with manual sharding of your database.

Learning Objectives

  • Cloud Storage - Windows Azure Storage
  • Techniques for schema design in NoSql
  • Build scalable applications in cloud
  • Windows Azure Storage - Best Practices

Rainy Days & Boot Storms Always Get Me Down (Evaluating Clouds: What You Need to Know)

Peter Murray, Senior Product Specialist, SwiftTest, Inc.

Abstract

Migrating to a private Cloud infrastructure doesn’t have to get you down.  Despite the complex interdependence between virtual machines, network blades, switches and storage systems, there is a way to manage risk. The answer is testing.    Cloud testing provides insight into your configurations (beyond the standard benchmarks) as well as “sunny day” versus “rainy day” usage patterns. Testing helps ensure scalability, reliability, and how robust the new Cloud infrastructure really is by detecting bottle necks and reactions to a boot storm.    Attend this session to learn the top 5 things when migrating to a cloud:    1. How will it work and scale?  2. How will it react to a boot storm?  3. What is the failover behavior under load?  4. How will the VM behavior and load affect the performance and availability of the storage services?  5. What is the relative scalability of the storage component regarding the compute component?

Learning Objectives

  • Questions to ask and lessons to learn when migrating to a private cloud

Bringing POSIX-like APIs and Cloud Storage

Wesley Leggette, Software Architect, Cleversafe, Inc.

Abstract

Emerging Cloud Storage API standards, including CDMI and OpenStack Storage, are oriented towards REST APIs that allow streaming object access. Several features taken for granted in traditional filesystem APIs like POSIX and Windows are not present in this new paradigm. Examples include random file updates and directory operations. Bridging technologies must address this disconnect to provide full functionality in a performant manner.     We detail the techniques we have used in order to make cloud storage file drivers functional and efficient for both every day user interaction and high-throughput, low-latency applications. 

Learning Objectives

  • Learn about techniques to overcome functional limitations of cloud storage APIs.
  • Discover how file system bridge drivers solve requirements for both human users and batch processing applications.
  • Learn about required caching techniques and how to make them work better for high throughput workloads.

A Simple Open Source Cloud Storage System

Dan Pollack, Senior Operations Architect, AOL

Abstract

Pending


How to Store Data to the Cloud without Giving the Cloud Anything

Jason Resch, Senior Software Engineer, Cleversafe, Inc.

Abstract

There are two main barriers standing in the way of broad adoption of cloud storage by businesses:    1. Concerns over reliability for mission critical data in the custody of the cloud storage provider  2. Concerns over the security for confidential data handed to the cloud storage provider    A new technique will be presented which utterly eliminates both of these concerns, in a very cost-effective manner, thus making cloud storage a viable option for businesses previously held back by one or more of the aforementioned concerns.  Lastly, it will be shown how the CDMI standard makes such a system much easier to realize in practice.

Learning Objectives

  • Introduction to new technique
  • Efficiency Properties of new method
  • Security properties of this method
  • Reliability Guarantees of new method
  • Leveraging SNIA's CDMI standard

Improving Cloud Storage Cost and Data Resiliency with Erasure Codes

Michael Penick, Senior Software Developer, GoDaddy.com

Abstract

Go Daddy's internal cloud storage solution uses replication to prevent data loss in the midst of frequent hardware failures. While replication works well as a simple failure handling strategy, it significantly increases hardware costs due to high overhead data. Erasure codes have allowed Go Daddy to reduce data overhead without decreasing availability or redundancy.  Go Daddy developed software to convert our existing production systems from replication to erasure codes without any interruption in service.

Learning Objectives

  • Evaluation on open source erasure code libraries
  • Discussion of software features required to support erasure codes
  • Comparison of erasure codes and replication in a production cloud storage system
  • Lessons learned while migrating to erasure codes

Erasure Coding in Windows Azure Storage

Cheng Huang, Researcher, Microsoft Research

Abstract

Erasure coding is being adopted in planet-scale cloud storage systems, from HDFS in Facebook, GFS II in Google to Windows Azure Storage in Microsoft. It promises to reduce hardware and operational costs by 50%. While many systems settle with conventional erasure coding schemes, invented more than 40 years ago and primed in deep space communication, Microsoft Research and Windows Azure Storage have partnered and designed advanced new erasure coding schemes - optimized specifically for cloud storage - to achieve significantly more savings.    This presentation is targeted at storage developers with strong desire to reduce storage system costs and operate more economically. We will first review conventional erasure coding schemes and then explain our advanced new erasure codes. We will describe how the new codes are being used in Windows Azure Storage and share lessons/experience learned from real production.

Learning Objectives

  • Refresh my knowledge of erasure coding.  (Some knowledge is assumed – this is *not* a tutorial on Erasure Coding.  For basics, see Jim Plank’s USENIX-FAST tutorial “Erasure Codes for Storage Applications”.)
  • Get an overview of erasure coding as currently applied to large-scale storage systems, specifically Cloud Storage.
  • Understand a class of erasure codes (used in Windows Azure Storage) that is useful and economical for Cloud Storage systems.



DEDUPE

Primary Data Deduplication in Windows Server 8

Sudipta Sengupta, Senior Research Scientist, Microsoft Research
Jim Benton, Principal Software Design Engineer, Microsoft Research

Abstract

We present the architecture of the Windows Server 8 primary data deduplication system that is designed to achieve high deduplication savings at low computational overhead on commodity storage platforms. High deduplication savings previously obtained using small ~4KB variable length chunking are achieved with 16-20x larger chunks. A more uniform chunk size distribution and increased deduplication savings are obtained using a new regression chunking algorithm. The challenge of scaling deduplication processing resource usage with data size is addressed using a RAM frugal chunk hash index and data partitioning, so that server resources remain available to fulfill the primary workload. Efficient performance and low RAM footprint associated with data access are maintained through the use of multiple techniques, including caching, read-ahead, and multi-level redirection tables.

Learning Objectives

  • Primary data deduplication and additional challenges over backup deduplication in server storage platforms.
  • Sub-file level deduplication, data chunking algorithms, tradeoff between deduplication space savings and chunk size.
  • How does deduplication and compression work together?
  • Scaling data deduplication processing: memory and I/O efficient chunk hash index, partitioned processing and reconciliation of partitions.
    Techniques for primary data serving when deduplication is enabled.

Storage Efficiency in Clustered Storage Environment

Ankur Saran, Systems Engineer, Tata Consultancy Services Ltd

Abstract

Data storage efficiency mainly by de-duplicating and compressing data is very popular and indeed it is one of the unique selling points for storage vendors. Such feature has some inherent problem, which results in running them as scheduled applications on the storage systems when CPU load is low, because of the amount of workload generated, due to the CPU’s intensive scanning operations. Despite of multiple de-dupe and compression techniques present today, it has become difficult to handle data efficiently in production environment including clustered and standalone storage appliances. In clustered storage environment, however, there is scope of leftover CPU bandwidth which can be used to improve data efficiency ratio without affecting other operation significantly. We will be presenting the proof of concept on the development of storage efficiency framework in clustered storage environment to better use leftover CPU and maximize overall cluster efficiency

Learning Objectives

  • Understanding storage efficiency  techniques
  • Study of different present techniques of storage efficiency
  • Study of Mapreduce and similar algorithms as a distributed system algorithm
  • Use of distributed algorithm to enhance the existing storage efficiency features
     

Prequel: Distributed Caching for the Masses

Christopher R. Hertel, Senior Principal Software Engineer, Red Hat
Jose Rivera, Software Engineer, Red Hat

Abstract

Prequel is an Open Source implementation of the Microsoft PeerDist distributed caching system, also known as BranchCache.    PeerDist is a simple WAN Acceleration system that can make SMB2 and HTTP run faster over wide area networks, but there are lots of other ways to leverage a distributed cache.  The techniques used are related those used in data deduplication and in tools like rsync.    In this presentation, we will discuss the internals of the Prequel implementation, including the problems we faced head-on (and stared down) plus the few that still keep us awake at night.  We will also cover:    * The integration of Prequel with SMB2 and HTTP servers.    * The integration of Prequel with file systems and other in-kernel subsystems.    * Leveraging distributed caching in the cloud and across wide area networks.    * Prequel and PeerDist documentation

Learning Objectives

  • Internals of PeerDist protocols
  • Design and limitations of the Prequel implementation
  • Integrating Prequel features into existing services

FILE SYSTEMS

Archive eXchange Format (AXF) - An Open Standards-Based Approach to Long Term Content Archiving and Preservation

Brian Campanotti, CTO, Front Porch Digital

Abstract

Rather than rely on archaic IT-centric technological approaches, recent application specific advancements in the area of data storage have been made that better target this demanding area while ensuring long-term preservation and accessibility to valuable media assets.    This paper will give a detailed technical overview of the emerging Archive eXchange Format (AXF) open standard for tape and disk-based content storage.  With its innovative "operating system per object" approach, AXF guarantees long-term accessibility to content via open tools while overcoming many of the limitations in other storage formats/technologies such as TAR and LTFS.  End-user case studies will focus attention on the key features of the AXF standard and how it can be leveraged to ensure open access, protection, and transportability of media assets.

Learning Objectives

  • How users can can solve long term archive and preservation challenges. 
  • How the embedded file system can makes all the difference
  • The advantages of encapsulation and scalability
  • How AXF is designed for long-term preservation
  • Status of work within the SMPTE TC-31FS30 WG Archive eXchange Format  (AXF) group

Object Oriented Storage and the End of File-Level Restores

Stacy Schwarz-Gardner, Strategic Technical Architect, Spectra Logic
 

Abstract

As unstructured data and cloud storage continue to grow, object oriented storage is becoming an ever increasing method of data management. With this has come the ability to manage virtual storage environments and tiered storage environments at the object level. This has a profound impact on our conventional strategies around archive, backup, and long term retention of data. Moreover it allows data to transition between various storage platforms such as SSD, HDD, Tape, and the cloud without impacting end user accessibility, eliminating the need for granular file restores to retrieve data from an archive or other storage tier. This session will give an overview of object oriented file systems, abstracting tape and other storage platforms to function as disk, and best practices in achieving long term retention in an object oriented environment. 

Learning Objectives

  • Why object oriented file systems are a better target for archive data than traditional backup methods. 
  • Why abstracting storage, particularly tape, to interact as disk is more efficient than abstracting disk to look like tape (VTL).
  • How to achieve long term data retention and preservation without the need for file level restores through object oriented storage.
  • Best practices in object oriented storage and data protection.

GlusterFS Challenges and Futures

Jeff Darcy, Principal Software Engineer, Gluster.org

Abstract

GlusterFS is a rapidly evolving distributed file system. This talk will start with a snapshot of what problems it already solves, then focus on some of the Hard Problems that it still faces - various forms of replication, "zillions of small files" workloads, and multi-tenancy. Possible solutions will be described, and - time permitting - some audience brainstorming might even occur.


Scalable, reliable, and Efficient Object Storage for Hadoop

Greg Dhuse, Senior Software Architect, Cleversafe, Inc.

Abstract

HDFS is the storage backbone that supports Hadoop's map/reduce jobs.  Designed to run on inexpensive commodity hardware, HDFS uses triple   replication to achieve reliability and availability.  We have applied  the scalability, reliability and efficiency benefits of information dispersal to yield   a new implementation of Hadoop's FileSystem interface.  This implementation is built on   our existing object storage API and thus it eliminates the overhead of replication   while achieving superior fault-tolerance.  Further, it has retained data-local computation, a primary   feature and benefit offered by Hadoop.  This presentation elaborates on the numerous challenges encountered   in the creation of this implementation, and explains how each was overcome.

Learning Objectives

  • Design of HDFS, and special requirements of Hadoop
  • Basics of Information Dispersal and Namespace
  • Problems overcome through our new design for Hadoop's FileSystem

Implementing HDFS as a Protocol on OneFS

Jeff Hughes, Sr. Engineering Manager, EMC Isilon

Abstract

The Hadoop Distributed File System is not just a filesystem, but also a file access protocol.  This talk describes how OneFS has implemented HDFS as a protocol on top of OneFS.  It describes how the OneFS implementation maps HDFS features into a POSIX-like distributed filesystem.  There is also analysis of benefits and drawbacks to the OneFS HDFS implementation compared to the Apache HDFS implementation.

Learning Objectives

  • HDFS NameNode and DataNode protocols
  • OneFS implementation of the HDFS protocols
  • Differences between Apache HDFS implementation and the OneFS protocol-only implementation

Linux File and Storage Systems: Challenges and Futures

Rick Wheeler, Architect and Manager of the RHEL File,System Team & Red Hat Storage, Red Hat

Abstract

The Linux community is at the cutting edge of many file and storage system technologies. This presentation will give an overview of the current activities in the open source Linux kernel world and also cover features that are shipping in Red Hat Enterprise Linux.

Learning Objectives

  • Learn about current work in the LInux kernel around file and storage.
  • Learn about what is currently supported in enterprise distributions.
  • Get an overview of Linux file systems

Scaling an Index to the Exabytes

Andrew Baptist, Lead Architect, Cleversafe, Inc.

Abstract

At the exabyte scale, even the metadata becomes so large that single  systems are unable to manage or even store it. The only way to cope with  trillions of objects and their associated trillions of pieces of  metadata is to rethink the fundamentals of how metadata is stored,  accessed, and manipulated in a massive system with potentially huge  numbers of users. We detail our solution, and show how it enables true  horizontal scalability, and with the surprising property that as the  system grows, metadata contention actually decreases.

Learning Objectives

  • How the distributed object storage cluster operates at scale.
  • Why a rich object API is more useful than a traditional file API.
    How cluster-coherent POSIX file access scales


Scaling Storage to the Cloud and Beyond with Ceph

Sage Weil, Creator - Founder, Ceph Storage, Inc. 

Abstract

As the size and performance requirements of storage systems have increased, file system designers have looked to new architectures to facilitate system scalability. Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance, and scalability from terabytes to exabytes.

Ceph utilizes a novel pseudo-random placement algorithm (CRUSH), active storage nodes, and peer-to-peer like gossip protocols to avoid the scalability and reliability problems associated with central lookup tables or gateway servers.  Ceph's architecture is based on an object storage service that provides a generic, scalable storage platform with support for snapshots and distributed computation.  This architecture allows much better scaling behavior than file-based distributed systems whose designs are constrained by legacy protocols like NFS.

This talk will discuss the distributed object storage layer, and ways in which it can be leveraged for cloud applications.  We will also discuss the POSIX distributed file system built on top of the object storage cluster, and how it achieves massive scales by rethinking the conventional client/server model.

Learning Objectives

  • How the distributed object storage cluster operates at scale.
  • Why a rich object API is more useful than a traditional file API.
  • How cluster-coherent POSIX file access scales to exabytes.


ReFS - Next Generation File System for Windows

J.R. Tipton, Principal Software Development Engineer, Microsoft
Malcolm Smith, Senior Software Design Engineer, Microsoft

Abstract

ReFS is a new filesystem for Windows that It is designed to be highly resilient and scalable, while being backward compatible with a s subset of NTFS features. ReFS has been described previously in the following blog http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generati..., but this presentation will be a technical drill-down directed at the SDC audience of storage engineers, with more depth and detail than provided previously.

Learning Objectives

  • Understand the motivations behind some of the design choices for ReFS.
  • Understand when to use this filesystem in Windows 8, in terms of supported features and tested configurations

What the Evolving Apache Hadoop Ecosystem Will Mean for Storage Developers

Sanjay Radia, Founder, Hortonworks 

Abstract

During the last 12 months, the Apcahe Hadoop ecosystem has experienced tremendous growth and empowered enterprises to better handle large volumes of data. In many ways, this data explosion is outpacing current storage, management and processing approaches. As a result of the growing ecosystem, HDFS, the storage engine for Apache Hadoop which allows data to be processed in parallel, has evolved, enabling better isolation, faster startups and upgrades, and better scalability. In this presentation, Sanjay Radia, one of the founders of Hortonworks, will discuss the latest advancements in HDFS, what improvements are currently in the pipeline, and explain how these changes will drive the future of storage in the enterprise.

Learning Objectives

  • How HDFS provides developers performance improvements for local access
  • Why did HDFS not use disk-raid and what impact it has on recovery, reliability and operational management
  • What enterprise support improvements are in the pipeline including support for snapshots and greater storage efficiency?
  • Should cold archival data sit on separate clusters? What are some of the options?

The Btrfs Filesystem

Chris Mason, Director of Kernel Engineering Lead Developer of the Btrfs Filesystem, Fusion-io

Abstract

The Btrfs Filesystem: Btrfs is maturing into a stable and reliable base for a variety of workloads. We will discuss Btrfs filesystem internals, and future plans. The latest features will be demonstrated, along with important areas of current development.


 

HARDWARE

Energy Efficiency Metrics for Storage

Herb Tanzer, Storage Hardware Architect, Hewlett Packard Co.
Chuck Paridon, Storage Performance Arcitect, Hewlett Packard Co.

Abstract

Energy efficiency metrics in terms of Idle Capacity/Watt, IOPs/Watt, and MB/s/Watt are starting to appear in the storage industry. These metrics can be used by the data center for power budget planning, and to reduce capital and operating expenses. SPC-1/E and SPC-2/E are storage performance benchmarks with energy extensions. The SNIA Emerald™ program enables the objective comparison of storage products for their energy efficiency. Slated to appear late this year or early next, the EPA Energy Star™ certification for datacenter storage is expected to require meeting specific checklist items, and submitting energy efficiency data on optimized systems.  Another metric formulated by The Green Grid™ is the data center storage efficiency (DCsE), which focuses on efficiencies during actual operation, as opposed to the above mentioned metrics which are applicable at the point of sale.

Learning Objectives

  • Describe the benefits of energy efficiency metrics
  • Compare the various energy efficiency metrics
  • Describe methodologies for arriving at optimal metrics, including modeling tools
  • Provide examples of some early data submissions for storage arrays, consisting of configurations tuned for HDD and SSD devices

Getting the Most from Your Hadoop Nodes

Gus Malek-Madani, CEO and Funder, Green Platform Corporation

Abstract

Getting the most out of the hardware is key to overall system performance.   As Hadoop becomes mainstream, most hardware manufacturers have begun to offer Hadoop nodes containing both servers and storage that are specifically designed for Hadoop clusters.  But which factors impact Hadoop storage performance and manageability?   As a followup to his well received presentation at SDC 2011, Green Platform CTO and founder , Gus Malek-Madani, a leading expert in vibration management, will share test results that show how normal levels of micro vibration in data centers add latency and reduce the effective  performance of your Hadoop clusters.

Learning Objectives

  • Understand the Vibration Penalty on the performance of Hadoop nodes 
  • Understand how performance-killing vibration exacerbates the storage bottleneck in Big Data applications
  • Understand the benefits of removing vibration 

Tiered Storage and Caching Decisions

Bob Griswold, Director of Industry Software Architecture, Western Digital Corp.

Abstract

“Storage is stale,” says the chipset guys, says the OS guys, says the flash guys. Storage is as storage needs to be – storage, a place to put your stuff. Just like in the real world you don’t park your commuter car in the garage, that’s for your project or muscle car, why would you only listen to a guy selling you SSDs about wher to put your *megaloads* of data? Tiered storage is all the rage, let’s use faster storage in front of slower storage in front of archive storage; sounds like RAID vendors and tape guys all over again. This presentation will look at tiered storage, changes in the fabric of the basic unit of storage – the LBA, and how to actually accelerate your storage so the *user* of the data sees the difference – because we all get measured there, not at the data sheet..

Learning Objectives

  • Learn how storage, IO, and resources in your system are impacted by the decisions you make in attempting to manage those resources.
  • Become familiar with the underlying methods of storage caching, what’s been around forever, and what’s on the horizon.
  • Understand why storage as it exist today is going to be around for a while, and how to make best use of those resources.

InfiniBand Technology and Usage Update

Erin Filliater, Enterprise Market Development Manager, Mellanox Technologies

Abstract

The InfiniBand new speeds beyond QDR are FDR, EDR HDR and NDR. FDR solutions were introduced to the market in mid-2011. EDR specification was updated to provide (26Gb/s per lane) or 100Gb/s data rate per 4x EDR port. The roadmap details 1x, 4x, 8x and 12x EDR and FDR, incorporating new link level singling of 64/66 encoding and new reliability mechanism (Forward Error Correction). The newly defined link speeds, reliability mechanisms and transport are designed to keep the rate of performance increase in line with systems-level performance increases. The session will provide a detailed review on the new InfiniBand speeds, features and roadmap as well as new RDMA based developments such as Microsoft SMB3 and software accelerations based on a newly concept of co-design architecture.

Learning Objectives

  • Detailed understanding of the new InfiniBand capabilities 
  • View into InfiniBand roadmap (2015-2016)
  • Usage of RDMA for storage acceleration
  • Taking advantage of RDMA – examples (SMB3)



KEY NOTE AND FEATURED SPEAKERS

Unified Storage for the Private Cloud

Dennis Chapman, Senior Technical Director, NetApp

Abstract

Technology trends such as virtualization and flash have disrupted the storage industry over the past 5 years. However the great majority of storage clients still use either file protocols (NFS, CIFS/SMB) or block protocols (iSCSI, FC) to access their networked data. As an enterprise consolidates and virtualizes its data centers into a private cloud it must consider the best integration of compute, network and storage resources using these storage protocols. OS/app instances frequently have specific requirements of their storage access method. Storage can be provided by the hypervisor or the client can directly mount a networked resource. However the storage is accessed and wherever it is located the data needs to be protected. These topics will be covered in this talk.


Windows File and Storage Directions

Surendra Verma, Development Manager, Storage and File Systems, Microsoft

Abstract

Highly scalable and cost effective storage is a reality when it comes to cloud infrastructure such as map reduce, indexing and other forms of blob storage. The same can’t be said about the more traditional applications that continue to rely on the traditional file semantics and APIs. Windows 8 storage investments allow us to move in the direction of scalable cost-effective storage for the applications that continue to rely on file semantics and APIs. Some of these have been previously talked about at SDC, including “Storage Spaces” and ReFS. A key innovation here is the notion of resiliency to various kinds of failures. This flips the default assumption from “the hardware under me will give me reliable storage” to “the hardware under me will make best effort but I have to expect failures and errors”. Another key assumption is that availability is key to scalability. When you’re dealing with errors on a PB volume, it’s far better to make parts of it available sooner than to make everything unavailable in the hope that everything will be available much later (it usually isn’t later as well). There is usually another form of data redundancy present that allows apps writers and administrators to recover the lost parts, and the sooner we can allow them to do the better it is. The goal of this talk is to provide the broad context around these and other innovations in the Windows storage stack.

Learning Objectives

  • Understand the storage and file system innovations in Windows 8
  • Understand how these innovations are foundational for further changes in Windows

Linux Filesystems: Details on Recent Developments in Linux Filesystems and Storage

Chris Mason, Director Kernel Engineering, Fusion-io

Abstract

From cell phones to HPC, Linux filesystems are used in every imaginable workload. We'll discuss advancements in the Linux storage stack, and how our filesystems are changing to meet the huge variety of ways Linux is used today. New features in Ext4, XFS and Btrfs will be outlined, along with detailed performance analysis and benchmarks.

Upcoming storage technologies play an important role in filesystem development, and we will overview some of the ways filesystems are adapting to the latest standards and hardware. Flash technology brings huge new challenges to filesystems, and we will cover some of the ways filesystems are innovating with flash.


Non-volatile Memory in the Storage Hierarchy: Opportunities and Challenges

Dhruva Chakrabarti, Senior Scientist, HP

Abstract

New non-volatile memory technologies such as phase change memory and memristors are likely to result in an architectural redesign of the storage hierarchy in future computers. The ability to support a byte-addressable interface, access latencies comparable to DRAM, and high density will allow all or part of main memory to be non-volatile. Persistent storage will be much closer to the CPU and no translation between object and storage formats will be required. By virtually eliminating the performance gap between transient and persistent storage, this change has the potential to redefine the data persistence model used by applications. Data structures will be instantly persisted, preserved across system restarts, and available for reuse and sharing across applications.

However, there are significant challenges, from hardware issues such as endurance to software aspects such as data consistency. Processor caches and buffers between the CPU core and main memory will remain, allowing volatility in the system. If a program fails, persistent data may be incompletely updated or data invariants may not be satisfied. This talk will focus on software issues surrounding data consistency on non-volatile memory because of hardware or software failures. We will examine a few solution approaches, the tradeoffs involved, and the need for additional functionality in this regard.


Building Next Generation Cloud Networks for Big Data Applications

Jayshree Ullal, President and Chief Executive Officer

Abstract

The advent of Cloud Computing changes the approach to datacenters networks in terms of scale, programmability and resilience. The ability to control, visualize and customize the cloud network is an important evolution with software defined networking. The advent of Big data, cloud and virtualization is demanding a new level of agility in networking. One can deploy applications more rapidly across shared server and storage resource pools than is possible with conventional enterprise solutions. This is very difficult to accomplish with traditional silo computing model and necessitates a new cloud stack.


Storage Systems for Shingled Disks

Garth Gibson, Professor, Computer Science, Carnegie Mellon University, and Co-Founder and Chief Scientist, Panasas Inc.

Abstract

Hadoop/HDFS file systems adapted to utilize potential explicit Shingled Disk (SMR) APIs.


The Big Deal of Big Data to Big Storage

Benjamin Woo, Founder and Managing Director, Neuralytix

Abstract

Storage represents the fastest growth opportunity to the Big Data market. IDC will present the who, what, where, why of how storage participates in the Big Data market.


The Evolving Apache Hadoop Eco System - What It Means for Big Data Analytics and Storage Developers

Sanjay Radia, Co-founder, Hortonworks

Abstract

This talk I will cover its impact on the storage industry and what hadoop means for the big data analytics.

Learning Objectives

  • Understand the motivations behind some of the design choices for ReFS
  • Understand when to use this filesystem in Windows 8, in terms of supported features and tested configurations

SSSI PCIe Round Table - A look at Emerging PCIe Technologies

Moderator: Eden Kim, CEO, Calypso Systems, Inc.

Panelists: Marty Czekalski, Interface & Emerging Architecture Program Manager, Seagate Technology
Dr. Easen Ho, CTO, Calypso Systems, Inc.
Tony Roug, Solutions Architect, Virident Systems

Abstract

The SSSI PCIe Round Table will look at emerging technologies, standards and deployment models for client and enterprise SSDs. Join respected industry technologists from Seagate, Virident and Calypso who will discuss PCIe NVMe and SOP/PQI protocols, enterprise server deployment architectures, advanced PCIe performance test and measurement, and field questions from the moderator and audience.

NFS

NFSv4 Protocol Development

Alex McDonald, CTO Office, NetApp

Abstract

The NFSv4 protocol undergoes a repeated lifecycle of definition and implementation. The presentation will be based on years of experience implementing server-side NFS solutions up to NFSv4.1, with specific examples from NetApp and others. We'll examine the lifecycle from a commercial implementation perspective; what goes into the selection of new features, the work with the IETF NFS standards body, the development process and how these features are delivered, and the impact these features have on end users. We'll also cover the work of Linux NFS developers and provide suggestions for file system developers based on these and vendor experiences; and finally, we'll discuss how implementation and end-user experience feeds back into the protocol definition, along with an overview of expected NFSv4.2 features

Learning Objectives

  • Understand the NFS protocol & its application to modern workloads
  • An overview of the IETF NFS Working Group, and the work it undertakes
  • How NFSv4.1 is being implemented by vendors and Linux developers 
  • The differences between NFSv3 and NFSv4.1, pNFS, FedFS
  • An overview of proposed features in NFSv4.2

NFSv4.1 Server Protocol Compliance, Security, Performance and Scalability Testing: Implement the RFC, Going Beyond POSIX Interop!

Raymond Wang, Senior Software Design Engineer in Test, Microsoft

Abstract

In a world where interoperability between heterogeneous systems becomes increasingly important, ensuring that the Microsoft NFS 4.1 server implementation was respecting the requirements of the RFC was a top priority when we started Windows Server 2012. This talk will focus on detailing the testing approach that allowed the team to deliver a high quality server implementation, even as industry NFSv4.1 clients were still in development.  It will also cover some of the tests that have been run during industry interop events to help increase the level of interoperability of the NFSv4.1 ecosystem.

Learning Objectives

  • NFSv4.1 protocol test challenges 
  • Test architectural overview
  • Building RFC compliance, security, performance and scalability tests suite 

 


NFSv4.1 Architecture and Tradeoffs in Windows Server 2012

Roopesh Battepati, Principal Development Lead, Microsoft

Abstract

Microsoft Windows platform has supported NFS protocol for heterogeneous environments for over a decade. This support has been significantly enhanced in Windows Server 2012 with a server side implementation of the latest version of the NFS protocol, specifically version 4.1. This talk will cover architectural features, implementation tradeoffs and limitations of NFS Server in Windows.

Learning Objectives

  • Understand which portions of the RFC 5661 were implemented in Windows Server platform 
  • Understand design and functionality of Windows NFS Server implementation 
  •  Understand how to programmatically manage NFS features via the NFS CIM provider

Scaling Oracle with pNFS: Improving database efficiency for scale-out architecture

Bikash Roy Choudhury, Solutions Architect, NetApp

Abstract

Oracle with Redhat Enterprise Linux over traditional NFSv3 has been driving a lot of mission critical business on NetApp storage   controllers successfully over the years. With Parallel Network file system(pNFS) that is part of NFS4.1,  has immensely revolutionized   the IO path by isolating meta-data, data and control communications  between the client and the server unlike traditional NFS. RHEL6.2   and NetApp’s scale-out architecture in Data Ontap 8.1 being the first pNFS client and server  implementation respectively that is production   ready. Oracle databases can definitely benefit from the scalabilty and direct data access to the database files. pNFS would provide more   resiliency in load-balancing the database workload seamlessly and improve the manageabilty when Oracle database is stored on a NetApp   controller with tiered storage within a cluster namespace. Solution around Oracle on RHEL6.2  and NetApp controller would further validate   the efficiency of the database over pNFS and its co-existance with other versions of NFS concurrently.

Learning Objectives

  • pNFS support for files in RHEL6.2 and Data ONTAP 8.1
  • Benefits of Oracle with pNFS and NetApp Cluster-Mode 

Testing 'Continous Available' File Servers - An end-to-end service viewpoint

Tsan Zheng, Sr. Test Lead, Microsoft
Aniket Malatpure, Sr. Test Lead, Microsoft

Abstract

One of the biggest innovations for the File Server in Windows Server 2012 is its new Continuous Availability capability for both the SMB3.0 and NFSv3.2 protocols. The main objective behind this feature is to ensure applications storing their data on the File Server can do so without downtime in the event of maintenance or unplanned failure of the File Server cluster. This talk explains how the test team tackled the challenge of ensuring end to end scenarios based on Windows Server 2012 and put in place by customers meet today stringent needs for availability.

Learning Objectives

  • Understand how the test team modeled end to end solutions for Application Servers based on Windows Server 2012 clustered File Servers
  • Understand the workloads, faults, and administrative actions included in the simulation
  • Learn about the infrastructure the team put in place to efficiently execute on this testing and monitor result

PERFORMANCE/WORKLOAD

Hyper-V Storage Performance and Scaling

 Liang Yang, Senior Performance Engineer, Microsoft
Joe Dai, Principal Software Design Engineer, Microsoft
 

Abstract

In this session, we cover the changes to the Windows Server 2012 Hyper-V storage stack that target performance and scale.  We will look at various workloads; VDI, SQL etc, and their IO characteristics when virtualized and analyze the performance on local as well as SMB3 configurations.

Learning Objectives

  • Understand Hyper-V storage stack and IO model, as they interact with underlying storage devices, local and SMB file servers.
  • Understand how IO patterns of ‘native’ application workloads such as VDI and SQL Server  are affected by operating within a virtual machine. 

The Virtual Desktop Infrastructure Storage Behaviors and Requirements

Spencer Shepler, Performance Architect, Microsoft

Abstract

Virtual Desktop Infrastructure (VDI) environments enjoy continued growth in deployment and use.  This growth increases the need to understand the storage utilization and characteristics of VDI installations. This presentation will cover the Hyper-V VDI deployment model along with the storage utilization patterns as the Windows VDI guests move through their life-cycle.  Detailed performance analysis will be shared to provide insight into what is needed to successfully build SMB 3.0 shared storage solutions for VDI deployments.  The attendee will gain insight into the VDI storage and networking requirements along with the ability to do basic analysis of a SMB 3.0 workload and how to analyze its performance.

Learning Objectives

  • Gain knowledge of Hyper-V VDI storage behaviors
  • Understand how to analyze SMB3.0 performance

SQL Server: Understanding the Application/Data Workload, and Designing Storage Products to Match Desired Characteristics for Better Performance

Gunter Zink, Principal Program Manager, Microsoft
Claus Joergensen, Principal Program Manager, Microsoft

Abstract

At SDC-2011 we announced the new SQL Server version code-named ‘Denali’, and showed the engineering work involved in getting SQL Server to run over a file access protocol like SMB.  A year later, we have lots of additional information to share on the application and data workloads that SQL Server generates.  We’ll discuss how you can use this data to design your storage products, regardless of whether they are SAN-attached block storage arrays or file servers accessed via the SMB 3.0 protocol.  We will demonstrate SQL Server running in various configurations to show scale-up, scale-out, and fail-over capabilities.  The latter part of this presentation will show significant performance data from SQL Server running on various systems, with emphasis on SMB 3.0 and SMB-Direct (RDMA).

Learning Objectives

  • Understand the application workload characteristics of SQL Server. 
  •  Understand how SQL Server accesses file-based storage, using the SMB protocol family.
  •  Understand how SQL Server performance data can help you design storage systems that are more performant with this important workload.

The Future of Protocol and SMB2/3 Analysis

Paul Long, Sr. Program Manager, Protocol Enginering Framework, Microsoft Corporation

Abstract

Pending

PLUGFESTS

 

Coming Soon

SECURITY IDENTITY

Local Accounts and Privileges in Likewise Storage Server

Rafal Szczesniak, Senior Software Engineer, EMC Isilon Storage Division

Abstract

Each SMB server implementation requires local accounts database in order to support security model involving local access rights. It enables securing local objects by means of security descriptors but also system resources and privileged tasks by means of local privileges. This way both local and, more importantly, domain accounts can be granted different access rights on per-system basis as needed. The talk presents the design and implementation of local accounts databases, their interfaces (local and rpc), limitations and interactions with other parts of Likewise Storage Server.

Learning Objectives

  • Local accounts and privileges database backend interfaces
  • Difference between local account access rights and privileges
  • Local and RPC service interfaces for management 
  • Objective 4

Identifying Mapping the OneFS Clustered File System

Steven Danneman, Senior Software Engineer, EMC Isilon Storage Division

Abstract

Building a NAS appliance, which seamlessly provides both SMB and NFS file sharing protocols, requires supporting both the authentication and access control semantics of Windows and Unix.  In a unified file system like this, between the authentication and authorization steps, arises a requirement for identity mapping.  ID mapping is a unique third step that equates security identifiers from both domains, in order to provide an authenticated ID that can be used in access control checks. This talk will cover the design and implementation of the Isilon OneFS identity mapping system.

Learning Objectives

  • Fundamental security object types on the Windows and Unix platforms.
  • A method of equating security objects between these two different domains.
  • A simple grammar for making runtime ID mapping decisions.
  • The Isilon implementation of these methods.

SMB/SMB2/SMB3

Linux CIFS/SMB2

Steven French, Senior Engineering, IBM

Abstract

The Linux kernel clients have had dramatic improvements in the past year - write and read performance is far better, and async support is better.  Encrypted transport support has also been added.  In addition the most current SMB2 kernel support will be demonstrated. 

Learning Objectives

  • How to configure for better cifs/smb2 performance using Linux
  • How to configure transport encryption for Linux
  • Differences between cifs and smb2 transport encryption
  • Differences between SMB2 and CIFS mounts for Linux

Multiuser CIFS Mounts

Jeff Layton, Sr. Software Engineer, Red Hat

Abstract

Most Unix-based SMB/CIFS filesystems (Linux cifs, Linux smbfs and  others) have traditionally been designed to use a single set of  credentials for all accesses to a particular mount. This design  limitation presents challenges for deployment by administrators who  would like local users to use their own credentials when accessing  files on the server.      This talk will cover a discussion of the Linux' CIFS client multiuser  code that allows it to spawn and track new SMB sessions whenever a new  user accesses the mount. Different aspects of the design will be  covered, as well as the new cifscreds tool that extends this to  non-krb5 auth.

Learning Objectives

  • Why using a single set of credentials is problematic on a multiuser system 
  • How the Linux cifs client remedies this with multiuser mounts
  • How to deploy multiuser cifs mounts under Linux in a typical scenario 
  •  

SMB 3.0 (Because 3 > 2)

David Kruse, Principal Software Development Lead, Microsoft

Abstract

This talk will introduce the feature set that falls beneath the SMB 3.0 family umbrella.   It will briefly summarize the features discussed last SDC under the SMB 2.2 protocol name, and then examine in more detail the aspects of the protocol that have been added, improved, or modified since then.  This will include a more detailed view of the Witness auxiliary protocol, an examination of encryption support and signing changes, the secure negotiate capability, file copy offload, and other changes.

Learning Objectives

  • Understand the overall scenarios and features that comprise an SMB 3 file server or client.
  • Dive deeper into the technical changes to the SMB protocol since the previous SDC conference, including those listed above.
  • Evaluate what aspects of the protocol are relevant for your client or server implementation or workload.

 


Understanding Hyper-V over SMB 3.0 Through Specific Test Cases

Jose Barreto, Principal Program Manager, Microsoft Corporation

Abstract

In Windows Server 2012, the new Hyper-V over SMB scenario was introduced, supporting the storage of live VMs on a file share using the SMB 3.0 protocol. In this session, we’ll take look closer at this scenario by describing specific test cases you can try yourself with either Windows Server 2012 File Servers or third party implementations of SMB 3.0. These include: Configuring the File Server (Standalone, Traditional or Scale-Out) and the Hyper-V hosts (standalone or clustered). Failing one of multiple NICs while VM is running. Live Migrating and Storage Migrating VMs on an SMB share (in a cluster or standalone). Saving, resuming and snapshotting a VM on an SMB share. Creating a Hyper-V Replica of a VM on an SMB share. Performing planned and unplanned failover of both the File Server cluster nodes and the Hyper-V cluster nodes or File server cluster nodes. Backing up VM using VSS for SMB File Shares. 

Learning Objectives

  • Describe the main scenarios for configuration of file server clusters and Hyper-V clusters using SMB file shares.
  • Understand the different types of Hyper-V Live Migration and Storage Migration related to SMB file shares.
  • Outline the different types of planned and unplanned failover cases for Hyper-V over SMB and how to provide continuously available SMB storage.

Continuously Available SMB Observations and Lessons Learned

David Kruse, Principal Software Development Lead, Microsoft
Matthew George, Principal Software Developer, Microsoft

Abstract

This talk will offer a walk-through of design points and protocol changes that enable continuously available SMB file access.  It will describe the iterative changes and thinking in the SMB protocol that took us from durability and resiliency in SMB2 to the CA support in SMB3, and discuss how the architecture built on lessons learned in each release.  It will also examine in greater detail some of the more complex aspects of handle recovery in CA, and look at what simplifying assumptions can be made during development as you step towards a final solution.

 

Learning Objectives

  • 1. Understand the different forms of handle recovery in SMB 2 and 3, and how they build on each other, as well as determine which ones are relevant for you to support.
  • Get a solid overview of how handle recovery in SMB 3 works.
  • Understand some of the more subtle and technical aspects of handle recovery (for either client or server) at a deeper level.

 


Status of SMB2/SMB3 Development in Samba

Michael Adam, Software Engineer, Samba Team

Abstract

Samba, the well known open source SMB server software features support for version 2.0 of the SMB protocol since version 3.6. The one omission from SMB 2.0 in Samba 3.6 is support for durable file handles. Meanwhile, Versions 2.1 and (in preview) 3.0 of the SMB protocol are available. Over the last months, the developers have worked hard on implementing "durable handles", SMB 2.1 and a basic support for SMB 3.0. This talk is a report about the development progress of these features that will be available in the next version 4.0 of Samba.

 


The Evolution of Asynchronous IO in Samba

Jeremy Allison, Engineer, Google Samba Team

Abstract

Over the years we have tried many ways to increase the parallelism of the IO path within Samba for read and write requests from Windows and other CIFS and SMB2 clients.    This talk will cover the techniques we have explored, what worked and what didn't, and how these programming methods are applicable to Linux servers and other operating systems to increase the throughput of a Samba server. I will also  summarize the current state of the art within the Samba4 codebase.

Learning Objectives

  • Asynchronous IO programming
    Samba
  • Open Source

Implementing Quality-of-Service Using SMB2 Crediting

Christian Ambach, IBM Samba Team

Abstract

Many users would like to establish QoS for their central fileservers to make sure that privileged clients will not be starved out by not-so-important clients.  The SMB2 protocol introduced the crediting mechanism that should be able to help here. But what would be possible approaches to establish the QoS using SMB2 crediting?  This session will cover potential approaches, sample implementations for Samba and the measured effects.

 


AD Everywhere: Active Directory Where You Wouldn't Expect It, using Samba 4.0

Andrew Barlett, Authentication Developer, Samba Team

Abstract

Andrew will highlight the key features of the Samba 4.0 project for NAS vendors, cloud operators and anyone else with an imagination for taking the industry's dominant LAN authentication technology into places it hasn't been thought practical before.

With features like our RODC functionality, Samba 4.0 can bring AD to NAS devices operating as the only server at a remote site, or into the cloud, with the customer in control of how far they trust each new extension of the network.  With full access to the AD DRS replication protocol, Samba implementers can avoid custom and painful 'password filter' solutions while keeping user passwords in sync.

Andrew will also cover the broader Samba 4.0 release, the culmination of years of effort to bring our production file server and AD DC development streams together.  Under this single release is also a SMB3 file server, print server, client libraries, python libraries and may other great things.  These two have been taken to places we would never have imagined, such as underpinning the OpenChange Exchange client the FreeIPA project.

No matter what your interest, Samba remains your single source for Microsoft networking interoperability solutions

Learning Objectives

  • Understand the key features of Samba 4.0 for NAS Vendors and cloud operators
  • Reinforce that this is a continuation of great work of the Samba file server
  • Imagine the possibilities brought on by using the RODC feature on sites 'too small' to have an AD DC.

 


SMB 3, Hyper-V and Data ONTAP

Garrett Mueller, Senior Engineer, NetApp

Abstract

An overview of the Data ONTAP Cluster-Mode architecture and how the new SMB 3, Witness, Remote VSS and ODX protocols allow NetApp to deliver a full Windows Server 2012 Hyper-V over NAS solution.

Learning Objectives

  • Data ONTAP Cluster-Mode and distributed protocols 
  • SMB 3 in Data ONTAP
  • Witness in Data ONTAP
  • Remote VSS in Data ONTAP
  • ODX in Data ONTAP

Design and Implementation of SMB Locking in a Clustered File System

Aravind Velamur Srinivasan, Senior Sftware Engineer, EMC - Isilon Storage Division

Abstract

This talk will examine the details of the design and implementation of a distributed locking mechanism for SMB in OneFS.    To implement SMB locking semantics, such as oplocks and byte-range locks, on a clustered file system such as OneFS, we need a distributed locking mechanism to coordinate locks between different nodes in a cluster. This talk will examine the details of such a distributed locking mechanism for SMB lock types on OneFS

Learning Objectives

  • Fundamentals of distributed locking for a clustered file system
  • Challenges in implementing the SMB locking semantics on a clustered file system
  • Details of the design and implementation of the distributed lock manager for SMB in OneFS 
  •  

Migrating Bing Maps to Windows 2012 Server

Keith Hamilton, Software Architect, Microsoft Corporation

Abstract

Windows  2012 Server has made substantial improvements in storage functionality. Because of these new features, Bing Maps, which has about 2 PB of storage, is planning to migrate to Windows 2012 Server RC1. The benchmarks using Windows 2012 Server Beta have been excellent and Bing Maps is looking to reduce capex cost significantly and opex costs by about 90%.

Learning Objectives

  • New storage features of Windows 2012 server
  • Design for scalable storage
  • Cost reductions with greatly improved performance
  •  

SMB 3.0 Application End-to-End Performance

Dan Lovinger, Principal Software Architect, Microsoft Corporation

Abstract

This session discusses SMB 3.0 performance capabilities, focused on scenarios where the SMB 3.0 client is running an application server workload such as SQL Server. This includes a look at speeds and feeds for modern file serving configurations and breakdown of potential bottlenecks, followed by comparative analysis of different configurations and specific optimizations for application server workloads.

Learning Objectives

  • Understand end-to-end speeds and feeds, including transport overheads
  • Database OLTP Application breakdown
  • Bulk Data Copy breakdown

High Performance File Serving with SMB3 and RDMA via the SMBDirect Protocol

Tom Talpey, Software Architect, Microsoft Corporation
Greg Kramer, Senior Software Development Engineer,Microsoft Corporation

Abstract

This talk will focus on how SMB3 and RDMA enable high performance file servers/clusters for application workloads. The talk will introduce the design of SMB3 for performance over RDMA on highly parallel modern machines, will include benchmark results for several key workloads, will examine the performance tuning/diagnostic facilities exposed by SMB3, and will share some of the key performance lessons learned during the development of SMB3

Learning Objectives

  • Understand the expected performance for a properly tuned SMB3/RDMA file server for several important workloads
  • Know what facilities exist for diagnosing and fixing SMB3/RDMA performance problems
  • Understand the ways in which the machine architecture and application design effect performance

 


Techniques for Debugging File Protocol Performance Issues

Evgeny Popovich, Principal Software Engineer, EMC Isilon Storage Division

Abstract

This talk presents debugging techniques used to identify and solve performance issues in UNIX user-space resource sharing protocol implementation such as the Likewise Storage Services platform.  The protocols being tested are SMB1 and SMB2, while the main benchmarks used are FSCT and NetBench.  The topics to be covered include analyzing the environment (network, storage subsystem, etc.), solving lock contention problems in multi-threaded daemons, memory allocation analysis, benchmark automation, and others.

Learning Objectives

  • Understand Likewise Storage Services architecture. 
  • Understand different aspects of debugging file protocol performance issues.
  • See how existing open source tools can be used to increase performance of a file server.

Testing Async SMB Samba

Volker Lendecke, Co-Founder of SerNet, GmbH

Abstract

Samba has a long tradition as a single-threaded, one process per client SMB server. Within the SMB1 protocol this has served us very well. Clients typically were single-threaded as well, although the SMB1 protocol would have allowed multiple simultaneous requests on a single SMB connection.

With SMB2 this changed significantly. The clients now regularly do multiple requests simultaneously. Jeremy Allison has a separate talk about how Samba embraced threading for asynchronous I/O.

The Samba testing infrastructure is single-threaded as well. To test the async multi-issue behaviour of our server we have developed a C API to support event-driven client programs that fill a SMB transport with multiple simultaneous requests.

For a quick performance test at a customer site a C API is not flexible enough. Samba 4.0 will ship the start of a Python extension that allows multiple independent python threads to asynchronously drive a single SMB connection.

This talk will present the architecture of the new extension and how to use it in custom Python scripts to test specific workloads.

SOLID STATE STORAGE

PCIe SSD Devices - A Year Later

Robert Randall, Windows Driver Architect, Micron Technology, Inc.

Abstract

As a follow up to my presentation last year:    Many PCIe SSD products have entered the market in the last year.  Standards bodies and the open source community have been busy supporting the continued delivery and evolution of PCIe SSDs.  What is new?  Where is the technology headed?  How fast is it in the real world?.

Learning Objectives

  • Understand the new development in the PCIe SSD technology space; standards and open source.
  • Understand what PCIe SSDs are available today and their characteristics.
  • Understand NAND factors; SLC vs MLC, risks, limitations, fit.
  • Understand deployment models; caching, hot-spots, etc.

I/O Virtualization - Enabling New Architectures in the Data Center for Server and I/O Connectivity

Sujoy Sen, Senior Manager, Software Architect, Micron Technology

Abstract

As Data Centers march towards more agility, it creates a need for treating all resources such as compute and I/O as flexible and separate pools. On the other hand, exciting technologies such as direct attached PCI-E Flash are emerging which promise to bridge the performance gap between compute and I/O thereby relieving the I/O bottleneck for applications. This talk will present I/O Virtualization as a technology that can fulfill the promise of agility by decoupling the CPU from the I/O subsystem while bringing the benefits of direct attached PCI-E devices to a shared environment. Such a technology will provide an order of magnitude boost in performance of current applications, open up new usage models in IT administration and create opportunities for innovative appliance architectures.

Learning Objectives

  • State of the art in server I/O virtualization architectures and the underlying technologies
  • Usage models and innovation opportunities with I/O Virtualized architectures
  • A look at the role of PCI-E based I/O virtualization on PCI-E SSDs 

NVM Express - Delivering Breakthrough PCIe SSD Performance and Scalability

Paul Luse, Software Architect and Development Lead,Intel

Abstract

The NVMe interface will go a long way towards alleviating the performance and feature constraints of today’s relatively slow disk interfaces currently in use by PCIe SSDs. Solution components built around the new NVMe standard and developed especially for PCIe SSDs, systems, devices, drivers, development, test and compliance validation infrastructure are either available today or will be very shortly. These new solution components are providing a robust, high performance and feature rich environment that will enable applications to deliver ever greater performance and capabilities into the hands of end users.

Learning Objectives

  • Understand the role & major features of NVMe
  • Understand how NVMe enables optimal performance of PCIe SSDs
  • Understand the status of the NVMe infrastructure

Programming Models to Enable Persistent Memory

Andy Rudoff, Enterprise Storage Architect

Abstract

Today’s server operating systems and applications are designed for hardware platforms the contain low-latency, byte addressable volatile memory, but require using slow, synchronous block I/O interfaces to persist data.  Emerging technologies such memristor and phase change memory create the possibility for a new tier that enables persistent media that is accessed like memory.  This creates opportunities for server operating systems, file systems and data services middleware but also requires a new programming model.  In this presentation Andy discusses possible APIs and programming models enabling the server software stack to optimize for persistent memory.  In addition to reviewing possible API features and programming models, Andy will review current activity underway in the Linux developer community to optimize the Linux kernel, and Linux filesystems, and volume managers for persistent memory.

Learning Objectives

  • Persistent Memory
  • Emerging models for leveraging high-performance persistence
  • Developing APIs in the LInux community

Exploiting the benefits of direct native programming access to non-volatile memory devices

Ashish Batwara, Principal Storage Architect, Fusion-io

Abstract

Over the past few decades the interfaces for accessing persistent storage within a computer system have remained essentially unchanged. Simply seek, read, and write have defined the fundamental operations that can be performed against storage devices. These interfaces worked well for the spinning media; however, these interfaces are sub-optimal for the non-volatile memory (NVM) devices due to the different performance characteristics and access patterns of these devices. Given these differences, a non-volatile memory management system more closely resembles a file system than it does a disk drive. This presentation will highlight the mechanisms and the associated benefits of exposing direct native programming interfaces such as atomic-write, persistent TRIM, EXISTS, Key-value cache/store etc. from non-volatile memory devices to applications.

Learning Objectives

  • How direct programming access helps simplifying the application code complexity?
  • How direct programming access helps accelerating the application performance?
  • Real-life examples of integrating few key applications with NVM devices using direct programming interfaces
  • Cost-benefit analysis

The Solid State Storage (R-) Evolution

Michael Krause, Fellow Engineer, Hewlett-Packard

Abstract

Whether driven by the “instant on” experience or extreme application performance demands, solid state storage is permeating market segments at an ever-accelerated rate.   This talk will explore the technology stepping stones required to create compelling customer value starting with today’s NVM technologies and looking forward to one, three, and five years into the future.  We will see how careful choices of platform, enclosure, and storage stack capabilities can enable continuous creation of sustainable customer value.

Learning Objectives

  • What makes for compelling customer value
  • Storage stack evolution – smooth transitions (e.g. SCSI Express) vs. “right-turn” NVM incorporation (direct NVM access)
  • Next generation storage media and the infrastructure

Making Sense of the SSD Jungle for Relational Databases

Jim Ting, Stec, Inc.

Abstract

Most database performance experts have been waiting for solid-state storage to free them up from constantly tuning their raw devices, SAN storage and various file systems. Now we are at tipping point where so many companies are offering solid-state products in different form factors, interface types of custom intelligence in ASIC or FPGA’s.    This presentation will start with a summary of solid-state storage devices in the market today, how these technologies impact three major databases Oracle MS SQL server and MySQL, and outline key questions to ask when selecting solid-state storage for your database applications.    Gurinder Brar started working with relational databases when Ingres and Oracle were very young companies. For the last 10 years he has worked for traditional storage companies to help position their products in database environments. Now he is responsible for defining relational database requirements for new solid-state products at STEC.

Learning Objectives

  • How to use SSDs for database applications
  • It is wild west in SSD landscape and what to watch for 
  • What questions to ask the vendors
  • Objective 4

Intelligent Controllers at the heart of modern Solid State Storage Designs

Anil Vasudeva, President and Chief Analyst, IMEX Research

Abstract

The advent of advanced controllers and firmware is allowing transparent mitigation of earlier issues related to reliability, endurance, data retention, performance, ease of management and quick integration of SSDs  using exiting storage interfaces. But Automated Storage Tiering Software tools, using workload I/O access forensics and behavior signatures monitored over time and the ensuing smart migration of hot data non-disruptively to SSDs has been the key design in obtaining over 475% improvement in IOPS and 80% improvement in response time at peak loads

Learning Objectives

  • Learn the innards of new generation of intelligent SSD storage controllers and automated smart-tiering designs as to how they improve  the performance, cost, reliability and endurance characteristics  of SSDs including workloads that benefit the most from the use of SSDs in enterprise storage systems.
  • How advanced SSD controllers and firmware transparently mitigate earlier issues related to reliability, endurance, data retention, performance, ease of management and quick integration using exiting storage interfaces
  • Comparison of ECC technologies deployed in SSD Controllers in future Flash technologies

How Many IOPS is Enough

Thomas Coughlin, President Coughlin Associates

Abstract

There are lots of SSDs on the market today offering IOPS (I/Os Per Second) performance in the thousands to hundreds of thousands, with indications that future models will offer speeds in the million-IOPS range. Meanwhile HDDs support from tens to hundreds of IOPS, depending on spindle speed and interface. Not every application can use the extreme performance of high-end SSDs, and some may not benefit from high IOPS at all. Since performance is tied to cost, users can save money if they understand how many IOPS the system really needs. This presentation will examine what makes an application require high IOPS and will profile applications according to their needs.

Learning Objectives

  • Learn what IOP load is required for various common enterprise applications through examples and survey results. 
  • Find out what the proper combination of HDDs and SSDS can satisfy IOP requirements for commone enterprise activities.
  • See some examples of how users have combined HDDs and flash memory to achieve cost effective solutions that meet their application requirements.

Building Commercial Storage Systems from Consumer SSDs

John Hayes, Founder and Chief architect, Pure Storage

Abstract

Consumer-grade flash-based SSDs deliver great performance compared to disks, but their well-known vagaries make designing a cost-effective all-flash production-grade array that scales to petabytes a daunting challenge. To overcome this, the Purity architecture developed by Pure Storage implements global inline deduplication and achieves consistently low latency in highly-available all-flash arrays that are in production today. This discussion will cover the techniques that make it possible to perform the hundreds of thousands of metadata insertions and updates per second required to achieve this reliability. The architecture utilizes an insert-only database to enable parallelism both within and across controllers, and a variety of tactical mechanisms that maintain metadata at a size consistent with constantly changing application demands. The discussion will further explain how Pure Storage's data structures and algorithms maintain performance, data integrity and availability in the presence of broad classes of multiple failures of varying severities.

Learning Objectives

  • Learn the global data structures that deliver logical integrity without sacrificing recovery performance
  • How to use multi-core processors effectively to enhance the performance of SSD-based arrays 
  • How to scale a flash-based array to the petabyte range without sacrificing performance or data integrity

Revisiting Storage for Smartphones

Nitin Agrawal, NEC Laboratories America

Abstract

Conventional wisdom holds that storage is not a big contributor to application performance on mobile devices. Flash storage (the type most commonly used today) draws little power, and its performance is thought to exceed that of the network subsystem. In this paper we present evidence that storage performance does indeed affect the performance of several common applications such as web browsing, Maps, application install, email, and Facebook. For several Android smartphones, we find that just by varying the underlying flash storage, performance over WiFi can typically vary between 100% to 300% across applications; in one extreme scenario the variation jumped to over 2000%. We identify the reasons for the strong correlation between storage and application performance to be a combination of poor flash device performance, random I/O from application databases, and heavy-handed use of synchronous writes; based on our findings we implement and evaluate a set of pilot solutions to address the storage performance deficiencies in smartphones. Full paper: http://static.usenix.org/events/fast12/tech/full_papers/Kim.pdf


PCI Express IO Virtualization Overview

Ron Emerick, Principal Hardware Engineer, Oracle Corporation

Abstract

PCI Express IO Virtualization Specifications working with System Virtualization allowing multiple operating systems running simultaneously within a single computer system to natively share PCI Express Devices. This session describes PCI Express, Single Root and Multi Root IO Virtualization. The potential implications to Storage Industry and Data Center Infrastructures will also be discussed.

Learning Objectives

  • Knowledge of PCI Express Architecture and Performance Capabilities, System Root Complexes and IO Virtualization.
  • The ability for IO Virutalization to change the use of IO Options in systems.
  • IO Virtualization connectivity possibilities in the Data Center (via PCI Express).

SAS SSDs – Building Blocks for High-Performance Storage

Ulrich Hansen, Director, Market Development, HGST, a Western Digital Company 

Abstract

Serial Attached SCSI (SAS) is becoming the preferred storage device interface for Enterprise applications.  Its many benefits include comprehensive high-availability features, mature host software stacks with robust reliability features, and widely supported industry standards with a strong performance roadmap going forward.     As Enterprise SSDs utilizing the SAS interface are becoming more and more available in 2011, SAS SSDs are emerging as the preferred building block in high-performance storage solutions for both server and storage systems in a variety of applications – from Web 2.0 infrastructure to traditional transacting processing and business intelligence.

Learning Objectives

  • Attendees will understand the key attributes and benefits for SAS SSDs and be able to identify key technologies implemented in SAS SSDs that are critical to meet the most demanding Enterprise requirements – for both SLC and MLC NAND configurations 
  • Attendees will hear about recent data on performance scaling of SAS SSDs in industry standard systems. This data will provide valuable guidelines on how to reach a desired performance and availability point with a flexible and cost-effective solution by deploying the appropriate type and number of SAS SSDs
  • Attendees will gain insight into the future of SAS SSDs, including the transition to an interface speed of 12Gb/s and the opportunities for MultiLink SAS

Is MLC Ready for the Enterprise?

Esther Spanjer, Director of Technical Marketing, SMART Storage Systems

Abstract

MLC NAND Flash was once considered unfit for the enterprise because it lacked one ingredient that was critical in the enterprise environment: endurance. While SSDs have quickly become the storage solution of choice for various applications, including OLTP, DSS, boot, webservers and VOD, many still hold the belief that MLC flash is only suitable for boot or read intensive applications. Are they right?    The success of SSDs in the enterprise has been  phenomenal. However in order for SSDs to truly move the needle in enterprise storage adoption, SSD vendors must find a way reduce cost and make MLC work. While this may seem like a “pie in the sky” idea, storage vendors are already beginning to make this a reality.

Learning Objectives

  • Dispel the myth that MLC flash-based storage devices cannot meet enterprise endurance requirements.
  • Illustrate how enterprise storage vendors are already adapting MLC technology in storage devices through important technology breakthroughs.
  • Present real world test results that show how enterprise SSDs with MLC NAND flash can achieve up to 50 full random drive writes per day for five years.
  •  Discuss the impact that these technologies will have on enterprise adoption of SSDs, and what this means for SSD prices in the near term.

STORAGE MANAGEMENT

NAS Management using Microsoft System Center 2012 Virtual Machine Manager and SMI-S

Alex Naparu, Software Design Engineer, Microsoft
Madhu Jujare, Senior Software Design Engineer, Microsoft

Abstract

Windows Server 2012 introduces support for deployment of virtual machines on file servers and NAS devices that support SMB 3.0. Virtual Machine Manager automates the provisioning of new file shares, share ACL management, and management of virtual machines on file shares, using SMI-S to manage NAS devices. Microsoft is working with industry partners to implement updates to SMI-S 1.6 profiles specifically for ACL management.

Learning Objectives

  • Windows Server 2012 supports SMI-S natively through the Storage Management API. This API is used by Virtual Machine Manager to automate NAS management scenarios end-to-end. This API is also available to any developer who wants to build value-add tools and products on top of Windows Server 2012
  • Deep dive into how the Pass Through in the Storage Management API allowing deeper integration with SMI-S providers. Learn how Virtual Machine Manager uses Pass Through to discover and manage NAS devices, provision shares, and modify ACLs on shares. Developers can use these same patterns for their own tools and products.
  • Lessons learned by the Virtual Machine Manager team – product development across a large number of SMI-S providers that span block, NAS, and fabric; engaging and working with the storage community to define updates to SMI-S; how SMI-S can help developers be more productive by reducing the development effort to manage more devices over time.

Service Level Objectives for Storage Solutions with Different Application

Dr. M. K. Jibbe, Director of Test Architect and Quality for APG Products, NetApp
Kuok Hoe Tan, Senior QA Engineer, NetApp

Abstract

Service Level Objectives (SLOs) focused approach to create solutions that would satisfy the needs of the customers based on the target workloads. SLOs will be used in the selection of components to use, features to develop, parameters to tune and the overall QA strategy. Solutions will cover performance in the optimal, transitional and degraded states that are critical to the end customer. In a storage array system there are 1031 possible parameters to tune an array system to best fit an application profile, Such difficult task is typically resolved by guidelines from the Storage RAID vendor. Such Guidelines quite often are hard to follow for the following reasons:  1) The recommendations do not fit the application profile being used by the customer  2) Guidelines typically require a RAID expert to select the best tuning parameters to fit the application profile due to the complexity of the RAID system and the sophisticated application that exists today.  3) Guidelines can not cover every possible application that exists in the SAN market today.  The method of the this paper resolves the above common issues with configuring an array system by analyzing the parameters and the requirements of a customer to determine the best configurations for the applications running on the RAID system in question.

Learning Objectives

  • What are the challenges involved in configuring an array system to meet customers' applic
  • What are the parameters involved in array system configurations? (SLO Criteria)
  • What are the tasks investigated by the method of this paper to meet SLO?
  • What are the basic steps for this configuration method to customize a RAID system to an application profile?
  • How do you verify that the array system is configured to meet a customer requirements?

Experiences Evolving SmarterILM

Mark Smith, Enterprise Architect, IBM GTS
Gabriel Alatorre, Software Engineer, IBM

Abstract

The Smarter ILM strategy announced by IBM in 1Q2011 required the integration of services and technologies from multiple IBM organizations. This presentation will review the evolution of the Smarter ILM core components that manage Storage infrastructure services and optimize mapping of data to Tiered Storage. Experiences gleaned from analysis tools run before implementing automated data migration across tiers and after implementing automated tiering will be presented. IBM's Research Division provided core technologies that integrate with IBM System & Technology Group products which are deployed and managed by Global Technology Services organizations. Smarter ILM component operation and integration with pre-requisite TPC and SVC products are detailed. Demonstration of Smarter ILM via replay of live sessions will be included in the presentation.

Learning Objectives

  • Understand methods for intelligent block storage tiering
  • Demonstrate impact of intelligent lifecycle storage management
  • Understand advantages of employing a Storage Catalog before automated tiering movements are initiated


State of SMI-S

Don Deel, Senior Technologist, Office of the CTO; Chair SNIA Technical Council

Abstract

10 years ago, the SNIA started working on a specification for the management of networked storage equipment in heterogeneous, multi-vendor environments. Known as SMI-S, this specification has grown through several releases and has been accepted as an ANSI standard and as an ISO standard. This session will cover the current state of SMI-S spec development, which versions are the current national and international standards, and what the plans are for taking SMI-S forward.

Learning Objectives

  • Learn about the current SMI-S spec development work
  • Understand the main functionality added by different SMI-S versions
  • See which SMI-S versions are ANSI and ISO standards
  • Understand what is planned next for SMI-S spec development

VIRTUALIZATION

Making a Virtualized Storage System Into Storage for Virtual Platforms - One Company's Journey

Lazarus Vekiarides, Executive Director of Software Engineering, Enterprise Storage, Dell 

Abstract

A lot has happened to the EqualLogic platform since the team noticed that a great deal of storage was being sold to customers of a certain virtualization platform in 2005. In trying to exploit the natural affinity of easy-to-use centralized storage in the virtualization market, a series of initiatives were set in motion that have transformed the value of the platform, and also radically changed what it means to be a storage array in a world dominated by virtualized compute.      Beginning with a requirement for shared storage in order to implement high-availability of compute loads, the virtualization platforms have been the inspiration of a prodigious amount of innovation over the last 6 years. This case study presents a candid historical and architectural perspective of the developments of the recent years. From management integration to the gamut of array-based offloads, we will discuss what the role is for intelligent storage in these datacenters, how that niche has evolved and a little of what can be expected as we look into the future.

Learning Objectives

  • Follow a comany's transformation from virtualized storage to storage for virtualized platforms
  • Discuss future trends in virtualized storage systems 
  • Analyze the role of intelligent storage in datacenters

Block Storage and Fabric Management using Microsoft System Center 2012 Virtual Machine Manager and SMI-S

Madhu Jujare, Senior Software Design Engineer, Microsoft

Abstract

Virtual Machine Manager automates the discovery and provisioning of iSCSI, FC, and SAS storage using SMI-S, including the discovery and provisioning of zones in a FC fabric. Virtual Machine Manager builds on top of Windows Server 2012 which introduces native support for SMI-S providers. Microsoft is working with multiple to deliver storage automation capabilities across a large number of partners: EMC, NetApp, HP, IBM, Dell, Hitachi, Fujitsu, StarWind, DotHill, and LSI.

Learning Objectives

  • Deep dive into Windows Server 2012 Storage Management API and native support for SMI-S based providers. Understand how Virtual Machine Manager integrates with the Storage Management API and all of the block storage capabilities. Capabilities include deep discovery, storage provisioning, masking/unmasking and much more. Virtual Machine Manager also offer APIs for end-to-end automation. Developers have access to all these APIs, enabling the development of powerful tools and products across a large number of arrays that already work with Windows Server 2012, allowing you to focus more on value-add capabilities.
  • Learn how Virtual Machine Manager uses Pass Through in the Storage Management API allowing deeper integration with SMI-S providers. Learn how Virtual Machine Manager uses Pass Through to discover and manage FC devices, fabrics, zones, and zone members. This also includes the automation of zone provisioning and zone member modification. Developers can use these same patterns for their own tools and products.
  • Lessons learned by the Virtual Machine Manager team – product development across a large number of SMI-S providers that span block, NAS, and fabric; engaging and working with the storage community to define updates to SMI-S; how SMI-S can help developers be more productive by reducing the development effort to manage more devices over time.

Storage Spaces - Next Generation Virtualized Storage for Windows

Karan Mehra, Principal Software Development Engineer, Microsoft

Abstract

Storage Spaces is a new virtualized storage capability in Windows that allows storage resources to be pooled and then exposed to the system as thinly provisioned virtual disks. Pools can be clustered across multiple machines to provide appropriate levels of resiliency and availability.     Although “Storage Spaces” has been described previously in a blog (http://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx) and presented at the 2011 BUILD conference (http://www.buildwindows.com/Sessions/Speaker/Surendra-Verma), this presentation will be a technical drill-down directed at an audience of storage engineers, with more depth and detail than provided previously.

Learning Objectives

  • Understand the motivations behind some of the design choices for Storage Spaces.
  • Understand how to use the features of Storage Spaces.
  • Understand how to build systems that leverage the capabilities of Storage Spaces.

How Can Hypervisors Leverage Advanced Storage Features?

Dr. Anupam Bhide, CEO, Co-Founder, Calsoft Inc.

Abstract

In Virtual environment like VMware, the Virtual Machine (VM) objects from storage perspective are file operations/bulk block operations.  Advanced features/technologies of NAS and SAN based Arrays/server like snapshot, clone, server copy, range locking, are not being utilized.  VMFS(x) on the storage attached to the ESX/ESXi hosts works perfectly fine, but the network usage (IP/FC/etc) goes up significantly when the storage is coming from NAS or SAN.  The goal is to offload the file-operations to the NAS/SAN based Arrays and leverage maximum benefits  to increase I/O performance, storage utilization and reduced network usage.

Learning Objectives

  • Virtual machine cloning and virtual machine deployment from a template are examples of operations that are offloaded to the NAS Server by the Full File Clone feature of VAAI
  • File Space Reservation and Extended Statistics features enable administrators to reserve space in the NFS file system for the entire capacity of a Thick Provision Lazy/Eager Zeroed virtual disk when provisioning a virtual machine
  • VAAI also offers a VMware native way to create and use the array-based Fast Clone – file-versions/file-clones/snaps for greater space and performance efficiency
  • The Full Copy feature speeds up the storage vMotion. The Block Zero feature speeds up the deployment of thick provision eager zeroed virtual disks
  • The Hardware-Assisted Locking feature avoid retries for getting lock.  The Dead Space Reclamation enables the reclamation of blocks from a thin-provisioned LUN on the SAN based arrays