As “big data” becomes increasingly ubiquitous and important to a variety of organizations, many of the existing database systems and platforms have become too cumbersome to set up and maintain: On-site systems can be complex and difficult to scale; and other “big data” solutions require constant tuning and administrators with the knowledge on how to do so. Though these obstacles can be overcome, the cost of such solutions can be enormous – even to large and fiscally sound organizations. Let me take you on a journey …
Setting the Scene
Imagine an average looking office, with people milling about and computers droning and beeping noise in the background. You notice a bespectacled man, hunched over a pair of monitors blinking yellow and red with warning messages. The man is sweating profusely as he reads a message on his computer:
“Error, error. Insufficient system memory. Error, Error …”
The man raises his arms to the sky in despair. The front door bursts open, and a cold breeze beckons in a vague figure, glowing in blue, who silently raises a FLAT PALM, blowing a cloud of Snowflakes onto the monitors. The yellow and red messages are now green, and the man leans back into his chair, smiling. All is right in the world. Birds chirp in the distance.
Okay, it’s not exactly like that, but it may feel like it with Snowflake.
Snowflake Explained
Snowflake is a cloud-based data warehouse that provides an unbounded platform for storing and retrieving data. Unlike traditional single-cluster shared-disk/shared-nothing architectures, Snowflake has as a multi-cluster, shared data architecture that is dynamic and highly scalable thanks to enterprise class cloud-based storage systems. These multiple clusters all access the same underlying data, but they run independently and without contention, enabling heavy queries and operations to run simultaneously without issue.
As is tradition, security is probably of the utmost importance to you and your business. Snowflake automatically encrypts all data. In fact, unencrypted data isn’t allowed whatsoever – so don’t even try it, okay pal? Multi-factor authentication and federated authentication are both available. Snowflake offers granular access control on all objects and actions. This means all communications between users and the database are encrypted, and access control auditing exists on everything from data objects to actions within the database itself.
Snowflake goes the distance for its clients, using third-party certification and validation to ensure security standards like HIPAA are met. Basically, you don’t have to worry about anything. Sounds pretty great, doesn’t it? Simply put, just load data into Snowflake and query that data – and they’ll do all the rest. It’s really that simple. Now that I have your attention, let’s cover some of the basic concepts of Snowflake.
Good Ol’ SQL
Snowflake is fundamentally built to be a complete SQL database. It is a columnar-stored relational database and works well with Tableau, Excel and many other tools familiar to end users. Snowflake has its own query tool, supports role-based security, multi-statement transactions, full DML, windowing functions and everything else expected in a SQL database. It’s the SQL you know and love – be gone, lesser-than imitators!
Easy Like Sunday Morning
Snowflake requires zero management from the end user, in stark contrast to traditional database systems. There are no knobs to turn, indexes to tune, partitions to build – Snowflake handles all of this for you automatically. This simplifies workflow allows users to simply load their data and run queries. You don’t have to have an army of resource-hogging database administrators constantly monitoring and tuning query performance and hardware.
Snowflake has an impressive dynamic caching system: All query results are stored for 24 hours (90 days for enterprise customers) or until the underlying data changes. If a new query matches an entry in the results cache, then the matching result is returned immediately without using compute. Think of the possibilities …
Any Flavor of Data You Like
Leveraging the cloud, Snowflake can easily hold all your data. All the different types of data that your business generates can be stored in one instantly accessible place. Snowflake can handle structured and semi-structured data alike; one of the more powerful features of Snowflake is its ability to load semi-structured data (like JSON, XML and Avro) in directly without any preparation or schema definition by the end user.
Snowflake natively treats semi-structured data as if it were relational and structured; it is columnarized and stored efficiently, metadata is extracted and encrypted and made available for querying just like your structured data. Yes, you can query your JSON data with SQL. Yes, you can join it to other structured data. No, it won’t make you a sandwich. You must do that yourself.
Such Scalability. Wow.
Built with ease of use and access in mind, Snowflake is available at any scale for all your users and applications. Because Snowflake brilliantly separates storage, compute and metadata management, trillions of rows can be sliced up with ease by concurrent users. Storage and compute can be scaled up and down independently and immediately, and the metadata service will automatically scale up and down as necessary.
In the Snowflake environment, you can load new data throughout the day as queries are being ran, which means things like shutting down database operations for overnight batch loads are not necessary. With Snowflake, you can create independent clusters and “environments” for your user groups and applications that don’t impact each other, yet all point back to the same common pool of data.
Waste Not, Want Not
Snowflake’s pricing is simple: you only pay for what you use. Snowflake is fully elastic, gone are the days of worrying about shifting data in and out of cold storage. Data is stored at a price that is comparable to cloud storage systems like AWS S3. You can store petabytes of data, pay for just the compressed volume of that data and then pay for the compute portion of the system when you need it. If you don’t need the data for a week, you only pay for the storage. If you need 25x more compute than normal for a brief period, you can easily scale up and back down and only pay for the delta. With the money you’ll save, you can probably afford to R&D that sandwich-making machine.
Testing the Waters
So, how does Snowflake stack up against traditional database systems and other, more recent alternatives? Being one of the newest solutions in the market, Snowflake was designed after the advent of the “cloud” and takes full advantage of recent technology. Obviously, the traditional systems were originally developed well before. Snowflake is beyond cutting edge – it’s bleeding edge, but with a trained staff of doctors and surgeons hovering around … in a clean room in a hospital … and the wound isn’t severe at all.
Legacy data warehouse products like Netezza and Vertica are built on old technology, are difficult to scale, have costly support and licensing and place the cost of management on you. Backup costs, disaster recovery and security are all the responsibility of the customer. These on-site solutions obviously have scalability limitations, which present potential for bottlenecks at multiple levels. With Snowflake, there is no trade-off between scalability and productivity – you can have your cake and eat it, too.
As mentioned earlier, another feature that puts Snowflake above these traditional solutions is its ability natively handle both structured and semi-structured data. Additionally, Snowflake’s automatic maintenance and database administration means huge savings over these products. It’s important to note that Snowflake is not an OLTP replacement. However, Snowflake is an extremely powerful back-end complement to OLTP as a cloud-based virtual warehouse for any large data sets.
Snowflake has some distinct advantages over NoSQL databases like Cassandra and mongoDB. Snowflake’s native support for semi-structured data means your JSON, XML, Parquet and Avro data can be loaded and ready for querying in minutes, compared to the hours or days of pre-processing that is required in NoSQL databases. As Snowflake loads semi-structured data, it records metadata which is then used in query plans and query executions, providing optimal performance and allowing for the querying of semi-structured data using common SQL. You can perform all your favorite SQL functions on structured and semi-structured data in the same query.
Other cloud-based data warehouses like Redshift and Azure still come up short in a variety of important categories. Both Redshift and Azure are difficult to manage, it’s not uncommon for customers of these products to spend hours (often weekly) doing maintenance like updating metadata, vacuuming, etc. Google’s BigQuery has its weaknesses too; it is not truly a relational database like Snowflake, has concurrency limitations and vague pricing. BigQuery is also a “black box” – customers have no control over the resources needed to complete initiated queries, which means inconsistent performance that has a direct impact on service-level agreements.
Snowflake should be considered as a replacement for open-source projects like Hadoop. Snowflake has the inherent advantages of being cloud-based, and it is a considerably simpler environment than Hadoop with overwhelmingly more automated management. Snowflake lets you focus more on using your data rather than managing the system your data lives in. Also, Hadoop is just a weird word and you won’t have to say it anymore.
Exit Music (for a Blog)
Snowflake provides unrivaled performance, mind-blowing levels of concurrency and the utmost simplicity. Snowflake’s unique architecture allows you to start small and grow large with a few simple clicks. Combining all the above with Snowflake’s awesome features like dynamic caching, zero-copy data cloning, diverse data and application integration, continuous data protection and much more, Snowflake clearly needs to be considered as a solution for any business working with “big data.”
I’ll be taking a closer look at Snowflake in future blog posts, but I understand if you just can’t wait. If you are curious to learn more about Snowflake, how it interacts with your existing products or how it can help replace your existing system, please contact us today and let us show you just how transformative Snowflake can be.