Basic Attributes - Instance configuration attributes. Include analysis_type
, params
, generator_llm
, analysis_llm
, context
, logger
.
Data Related Attributes - Input dataset and vector store connections. Include df_dict
, database_connector
, vector_store
.
Analysis Related Attributes - Values generated during analysis. Include analysis_code
, analysis_guide
, analysis_output
, plot_code
.
Output Attributes - Output values returned as responses. Include plot_output
, insights_output
, recommendations_output
, tasks_output
, ai_queries_output
.
Basic Attributes
analysis_type
Literal['sql', 'ml', 'skip']
The type of analysis to be performed.
Dictionary of class parameters. Maximum number of retries for the LLM calls and analysis. Default is 10.
Time limit in seconds for the LLM calls and analysis. Default is 45 for analysis and 60 for visualisation.
Whether to automatically add questions with their SQL query or Python code to the vector store. Default is True.
LLM instance for generating analysis. Default LLM used is GPT-4o. For details on configuring the LLM, see the Large Language Models guide. Name of the LLM model to use.
API key for accessing LLM services. May also be set as an environment variable.
LLM instance for performing analysis. Default LLM used is GPT-4o. For details on configuring the LLM, see the Large Language Models guide. Name of the LLM model to use.
API key for accessing LLM services. May also be set as an environment variable.
Context dictionary for the analysis. Context for the analysis.
Context for the visualisation generation.
Context for the insights generation.
Context for the recommendations generation.
Context for the tasks generation.
Logger object for logging messages.
Dictionary of dataframes loaded from files or databases. df_dict = {
"table_name" : pandas.DataFrame,
}
Database connector object for connecting to databases. Hostname of the database server. Applicable for PostgreSQL and Redshift databases.
Port number of the database server. Applicable for PostgreSQL and Redshift databases.
Username for the database connection. Applicable for PostgreSQL and Redshift databases.
Name of the database to connect to. Applicable for PostgreSQL and Redshift databases.
Password for the database connection. Applicable for PostgreSQL and Redshift databases.
Schema names to load. Applicable for PostgreSQL and Redshift databases.
Table names to load. Applicable for PostgreSQL and Redshift databases.
conn
psycopg2.connect or redshift_connector.connect or sqlite3.connect
Connection object for the database.
Vector store object for storing questions and their SQL queries or Python code. For details on configuring the vector store, see the Vector Store guide. Path to the vector store file.
chroma_client
chromadb.PersistentClient
ChromaDB client object for storing vectors.
Collection object for storing documentation.
Collection object for storing DDL queries.
Collection object for storing question and SQL query pairs.
Collection object for storing question and Python code pairs.
Collection object for storing question and plot code pairs.
Code generated by the LLM for analysis.
Guide used to generate the analysis code.
analysis_output
pandas.DataFrame or dictionary or string
Output generated by executing the analysis code.
Code generated by the LLM for generating visualisations.
Output Attributes
Path to a PNG file containing the plot generated by executing the plot code.
Insights generated by the LLM.
Recommendations generated by the LLM.
Tasks generated by the LLM.
AI queries generated by the LLM. ai_queries_output = {
"type_of_analysis1" : [ "query1" , "query2" , "query3" , "query4" ],
"type_of_analysis2" : [ "query1" , "query2" , "query3" , "query4" ],
"type_of_analysis3" : [ "query1" , "query2" , "query3" , "query4" ],
}