Pythonic Analysis Agent
LLM-powered pythonic conversational analytics
DataAnalyzr’s pythonic analysis provides users with a comprehensive toolkit for conducting machine learning-based analysis using Python’s extensive array of libraries such as NumPy, pandas, scikit-learn, and statsmodels. Users can tap into Python’s vast ecosystem of machine learning tools and techniques to derive actionable insights from their data and solve complex analytical problems.
Getting started
Data analysis with DataAnalyzr includes four simple steps:
Install lyzr
Install the lyzr
package with the data-analyzr
variant using pip.
Create an instance
Create an instance of the DataAnalyzr
class with the desired analysis type and API key.
Load data
Load data from files, Redshift, PostgreSQL, or SQLite databases using the get_data
method.
Ask a question
Ask a question using the ask
method to generate visualizations, insights, recommendations, and tasks.
Let’s go over these steps one by one for pythonic data analysis.
Installation
The first step is to install using pip. In order to use lyzr’s data analysis capabilities, install it with the data-analyzr
variant.
This will install the lyzr
package alongside all dependencies required for data analysis.
Creating an instance
The next step is to create a class instance.
The analysis_type
parameter can take three options:
ml
- for analysis with Python code, using Pandas, Scikit-learn and other similar packages.sql
- for SQL analysis.skip
- if you want to skip the analysis altogether, and get insights directly from the uploaded data.
api_key
parameter. This parameter is optional and given as an alternative to setting the environment variable.For details on all the options available in instantiating the DataAnalyzr
class, visit the API reference.
Loading data
DataAnalyzr provides multiple options for connecting with your data. Whether you are working with data files in CSV, Excel, JSON, etc. formats, or you want to connect to an online database in Redshift, or perhaps you have a local SQLite database, there is an option for you.
The class method used to connect with data is get_data
.
It takes three parameters - db_type
, db_config
, and vector_store_config
- the values of which depend on the format of the input data.
Let’s look at a couple of examples:
Loading data from files
Collect all your data files in a list of dictionaries, with their names, paths and keyword arguments.
Then pass this dictionary when calling the get_data
method.
The value of db_type
tells the system:
- which type of data it will need to explore
- what to expect in the db_config parameter
Loading data from Redshift
As a first step, you will need to collect all the Redshift details.
Once again, note that the value of db_type
tells the system:
- which type of data it will need to explore
- what to expect in the db_config parameter
While you may pass only one database
in db_config
, the number of tables and schemas is not limited, they are passed as lists.
If no schema
and tables
are passed, all the tables from the public
schema are taken.
Loading data from PostgreSQL
The implementation for PostgreSQL is very similar to that for Redshift. Start by collecting all the DB details.
Note that the value of db_type
is now postgres
, while everything else is the same.
Loading data from SQLite
A local SQLite database can also be used for analysis with DataAnalyzr.
You only need to pass the path to this database in the db_config
parameter.
Alternatively, if you have a URL which holds the SQLite database, you can pass this URL as the value of db_path
.
Getting results
You can use the DataAnalyzr
object to perform an analysis on the DataFrame by passing an analysis query to the method ask
.
This function enables you to ask questions directly related to the data at hand, allowing DataAnalyzr to process the inquiry and provide the corresponding visualisation, insights, recommendations, and tasks.
A most simple such implementation looks like this:
Here, result
has keys visualisation
, insights
, recommendations
and tasks
.
You can control the outputs received from ask
:
Here, result still has the keys "visualisation"
, “insights”
, “recommendations”
and “tasks”
but their values are changed.
You can also specify the context for the analysis and the generation of outputs. For example, you could let the system know that this analysis is for a specific user profile, or that the outputs should be presented in a specific way.
Getting visualisations
To retrieve plots from your analysis, use the ask
method and pass "visualisation"
in your outputs
parameter.
The dictionary returned has a key "visualisation"
which contains the path to the PNG image. By default, visualization images are saved to ./generated_plots/plot.png
, but this can be controlled using the plot_path
parameter. Here’s an example:
The result then has a "visualisation"
key which is simply the value of plot_path
:
It is also possible to pass a directory to plot_path
.
The generated image is then saved in the directory.
Additionally, you may pass a context for the image generation.
You can then view the image using pillow
or matplotlib
: