We are pleased to announce that today we released Practicus AI v23.5 with many new and exciting features.
Some of our new features in this release include:
- Time series forecasting
- Anomaly detection
- Segmentation (clustering)
- New and improved automated feature selection
- Data mesh collaboration
- The new “my” folder
- Fast IO shared persistence across users
- Improved Data Mesh Architecture
- Improved Connection Governance
- New and improved connectors
- Improvements to ChatGPT
New modeling types
In addition to objective-based modeling, we now have three more modeling types:
- Time Series Forecasting
- Anomaly Detection
- Segmentation (Clustering)
Let’s explain time series forecasting using the US airline passenger dataset that comes pre-installed in our Cloud Workers.
After you click model in the app, you can select time series forecasting and then select period as the date column and passengers as the objective.
After the AutoML is completed, you can choose how far ahead you would like to forecast (horizon) and optionally visualize the predictions as well.
You will immediately see the actual and predicted values on a plot such as the one below.
New Automated Feature Selection
Practicus AI already had automated feature selection functionality. We completely renewed our feature selection engine and added the capability to customize it by setting a target total importance percentage. The default is 95%.
Lower total importance, such as 80%, can be useful, especially if you would like to expose your models as APIs. If you have many features, e.g., several hundred, and you would like to find the optimal feature count, the new functionality can help you choose.
You can view the feature selection summary and importance table using the integrated mlflow UI or Jupyter notebook that comes pre-configured with Practicus AI.
You can now predict using fewer features. Your APIs will be simplified too.
The new "my" Folder
Practicus AI users could already persist (save) and share their files using object storage systems such as S3.
For an improved user experience, Practicus AI now allows you to auto-mount (attach) fast IO capable distributed storage systems on your Data Mesh.
Your selected files will be preserved after Cloud Worker termination, restarts, AND they are shared in real-time between multiple Cloud Workers too.
Why would you need this? Think of the below scenario:
You are using a Cloud Worker named “Worker-24”, you then load data from a Data Warehouse, make changes, and save to your “my” folder.
You decide to further experiment using a Jupyter notebook and save the notebook file to your “my” folder as well.
You then realize you need more capacity, say 32 CPUs instead of the 8 you are already using, and create a new Cloud Worker named “Worker-25”.
You can now read and write files simultaneously, including notebooks and supporting files such as .csv files.
This approach will work several orders of magnitude faster than a cloud drive app such as Google Drive and it will be 100% private to your organization.
Shared and fast IO persistence across users
You can use the above “my” folder technology for shared, fast, and simultaneous persistence across multiple users as well. This is in addition to our existing object storage (e.g. S3) functionality.
Administrators have fine-grained access control, and can easily define which users or user groups have read or write access to any shared persistence medium. They can define as many as required, and for improved performance, these shares can reside on different storage systems. You can achieve many gigabytes per second of IO.
Similar to the “my” folder experience, multiple users can then read/write to the same files and folders simultaneously, and with high throughput. This feature will improve collaboration in your organization.
Improved Open Data Mesh Architecture
You could already mix and match different deployment types with Practicus AI, e.g. enterprise Kubernetes on any cloud, or on-prem, Docker Desktop, or AMIs through the AWS Marketplace.
We further improved the data mesh features, allowing users to seamlessly and simultaneously work with data sources distributed in different cloud regions, across cloud vendors, and on-prem. As if they were all in the same location.
Improved Connection (Data Catalog) Governance
We added new functionality, giving administrators more flexibility and visibility to better govern connections (data catalogs) across the organization and using multiple data mesh deployments on various clouds or on-prem.
Administrators now also have more features to centrally manage shared connections across users.
New and improved connectors
We improved our existing connectors’ performance and added several new ones too!
Improved ChatGPT integration
Last but not least, we made some significant prompt engineering improvements to our OpenAI ChatGPT integration too.
The new changes include optimized context sharing, better code cleaning, and better token and cost management features.