Getting started
Data analysis with DataAnalyzr includes four simple steps:1
Install lyzr
Install the
lyzr
package with the data-analyzr
variant using pip.2
Create an instance
Create an instance of the
DataAnalyzr
class with the desired analysis type and API key.3
Load data
Load data from files, Redshift, PostgreSQL, or SQLite databases using the
get_data
method.4
Ask a question
Ask a question using the
ask
method to generate visualizations, insights, recommendations, and tasks.Installation
The first step is to install using pip. In order to use lyzr’s data analysis capabilities, install it with thedata-analyzr
variant.
lyzr
package alongside all dependencies required for data analysis.
Creating an instance
The next step is to create a class instance.analysis_type
parameter can take three options:
ml
- for analysis with Python code, using Pandas, Scikit-learn and other similar packages.sql
- for SQL analysis.skip
- if you want to skip the analysis altogether, and get insights directly from the uploaded data.
Note that you can also provide an
api_key
parameter. This parameter is optional and given as an alternative to setting the environment variable.For documentation on the SQL-based analysis, please refer to the SQL Analysis Agent documentation.
DataAnalyzr
class, visit the API reference.
Loading data
DataAnalyzr provides multiple options for connecting with your data. Whether you are working with data files in CSV, Excel, JSON, etc. formats, or you want to connect to an online database in Redshift, or perhaps you have a local SQLite database, there is an option for you. The class method used to connect with data isget_data
.
It takes three parameters - db_type
, db_config
, and vector_store_config
- the values of which depend on the format of the input data.
Let’s look at a couple of examples:
Loading data from files
Collect all your data files in a list of dictionaries, with their names, paths and keyword arguments. Then pass this dictionary when calling theget_data
method.
db_type
tells the system:
- which type of data it will need to explore
- what to expect in the db_config parameter
Loading data from Redshift
As a first step, you will need to collect all the Redshift details.db_type
tells the system:
- which type of data it will need to explore
- what to expect in the db_config parameter
database
in db_config
, the number of tables and schemas is not limited, they are passed as lists.
If no schema
and tables
are passed, all the tables from the public
schema are taken.
Loading data from PostgreSQL
The implementation for PostgreSQL is very similar to that for Redshift. Start by collecting all the DB details.Note that the value of
db_type
is now postgres
, while everything else is the same.Loading data from SQLite
A local SQLite database can also be used for analysis with DataAnalyzr. You only need to pass the path to this database in thedb_config
parameter.
db_path
.
Getting results
You can use theDataAnalyzr
object to perform an analysis on the DataFrame by passing an analysis query to the method ask
.
This function enables you to ask questions directly related to the data at hand, allowing DataAnalyzr to process the inquiry and provide the corresponding visualisation, insights, recommendations, and tasks.
A most simple such implementation looks like this:
result
has keys visualisation
, insights
, recommendations
and tasks
.
You can control the outputs received from ask
:
"visualisation"
, “insights”
, “recommendations”
and “tasks”
but their values are changed.
Getting visualisations
To retrieve plots from your analysis, use theask
method and pass "visualisation"
in your outputs
parameter.
The dictionary returned has a key "visualisation"
which contains the path to the PNG image. By default, visualization images are saved to ./generated_plots/plot.png
, but this can be controlled using the plot_path
parameter. Here’s an example:
"visualisation"
key which is simply the value of plot_path
:
plot_path
.
The generated image is then saved in the directory.
Additionally, you may pass a context for the image generation.
pillow
or matplotlib
: