SQL for ANY data, ANY size, GPU accelerated, no setup needed!
We are now bringing an extremely flexible and powerful Big Data SQL engine to your fingertips. And best of all, you don’t need to configure a library / product or use a separate service.
Let us go through how it all works.
Before v22.9, you could already run a SQL query in Practicus AI IF your data source, such as a database or data warehouse, supported SQL.
With v22.9, we are taking SQL to any kind of tabular data source.
Now let’s see an example of running queries on a data store that doesn’t support SQL. Such as S3, a .csv file, or an Excel file on your laptop.
First, let’s load some data, for instance 2020 sales data from a parquet structure on your S3 Data Lake.
You can click the SQL Query button to start experimenting right away.
Please note that SQL is just another data preparation step, such as the ones you can do by picking actions from Prepare menu, writing formulas, adding custom Python code, etc.
After authoring your SQL statement, you can click the Test SQL button (or hit shift + F10) and the test result will be displayed on a small sample. We use top 10K rows by default.
Once ready, you can click Apply SQL to apply your query to the entire in-memory experimentation data set that you loaded, for instance 1 million.
You can keep adding SQL steps, one after another. In a traditional database you would most likely keep updating, and often complicating, a single SQL statement. Let’s add one more SQL.
Your second query will run on the output of the first one.
Let’s say you forgot to add one more aggregation on the first SQL Query. You can click the steps button to view all of the data preparation steps, and then click edit button to quickly make changes to the SQL step.
Your change to the first SQL will cascade to second SQL and all other steps instantly. Yo do not need to reload data, since this is all happening on in-memory cached data with checkpoints, to improve performance and user experience.
You can continue with other data preparation steps, or analyze the data by visualizing as usual.
As a last (optional) step you can click the deploy button to create production-grade-clean-code with the SQL and all other steps as a bundle, and deploy it to anywhere you like. Instead of the sample we worked interactively on (2020 sales) the deployment code can run on the entire data set, and on larger cloud capacity as well.
Other notable points worth mentioning:
- In this demo we used a CPU powered cloud node, you can use GPU powered ones especially to aggregate billion+ rows almost instantly
- We did not install or configure any extra software, product or service. You can click the cloud button in the app as usual, launch a new Practicus AI cloud node (private to your AWS account) and it will be ready to run SQL on any tabular data source.
- Our SQL feature needs the cloud, and it is very easy to switch back and forth. For instance, you can open a .csv file on your laptop using the app, make some changes, and then click the SQL button. The app will offer you to seamlessly move your data to a cloud node of your choice, and apply your existing steps, so you can continue ruining SQL statements uninterrupted. If you do not have a cloud node running, the app will offer you to launch a new one first, which will be ready in 1-2 minutes. Bottom line is that we will do all the heavy lifting for you, so you can focus on your data, SQL, and AI models!
New Cloud Worker Configuration Pane
- Switch to an entirely new cloud node or cloud region: While working on your data if you realize you need a larger sample size, different architecture (CPU vs GPU) or, if you are simply running out of memory, you can now click to move to a different cloud node, even on a different cloud region around the world, and you will be able to continue experimenting as if nothing happened. We will move your work (steps) and sync where you are left in a matter of seconds.
- Change Data Source: Let’s explain using an example, you can load a local csv file, start making changes and then switch to S3 or a database query, and your steps will be seamlessly moved to the new data source improving user experience.
- Change Sampling Settings: Say you started with a thousand rows and now you need a million? Or you read top, and now you need random? You don’t need to restart/reload anything, simply change sampling settings and you will continue your experimentation without any disruption.
- Change Data Engine (Advanced): You can start with any data engine i.e. Pandas (default) and then mid-way through you can seamlessly switch to another engine such as DASK, Spark, RAPIDS for GPUs or RAPIDS + DASK for distributed GPUs.
New cloud region, and larger capacity all around the world
- Practicus AI is now available in UAE!
- We now support 22 geographical regions all around the world from Cape Town to Stockholm to São Paulo to Tokyo.
- We also started offering several new AWS cloud node instance types for many regions including:
- With the above cloud node types and others you can have all the power you need for most complex AutoML problems, and faster data processing, down from hours to seconds, for all your big data workloads. Available for both interactive workloads (GUI App), and batch processing (i.e. SDK + Airflow)
- We aim to support new instance types as soon as they become available on a certain geo, please let us know if we miss them in your region.
- DASK-distributed performance improvements: For large scale distributed data processing including CPUs and GPUs
- Apache Spark performance improvements
- S3 read/write performance improvements for all data engines (Pandas, DASK, RAPIDS, RAPIDS multi GPU, Spark)
- Upgrade to Qt6 C++ library which will give you an even faster GUI App experience
- … and many other improvements!