GuglielmoTor commited on
Commit
feaf9aa
Β·
verified Β·
1 Parent(s): 62d9a4c

Update insight_and_tasks/agents/post_agent.py

Browse files
insight_and_tasks/agents/post_agent.py CHANGED
@@ -0,0 +1,531 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # agents/post_agent.py
2
+ import pandas as pd
3
+ from typing import Dict, List, Any, Optional
4
+ import logging
5
+ import pandasai as pai # Assuming pandasai is imported as pai globally or configured
6
+
7
+ from google.adk.agents import LlmAgent # Assuming this is the correct import path
8
+
9
+ # Project-specific imports
10
+ from utils.retry_mechanism import RetryMechanism
11
+ from data_models.metrics import AgentMetrics, TimeSeriesMetric
12
+
13
+ # Configure logger for this module
14
+ logger = logging.getLogger(__name__)
15
+
16
+ DEFAULT_AGENT_MODEL = "gemini-2.5-flash-preview-05-20"
17
+
18
+ class EnhancedPostPerformanceAgent:
19
+ """
20
+ Enhanced post performance agent with time-series metric extraction and detailed analysis.
21
+ """
22
+ AGENT_NAME = "post_analyst"
23
+ AGENT_DESCRIPTION = "Expert analyst specializing in content performance trends and engagement patterns."
24
+ AGENT_INSTRUCTION = """
25
+ You are a specialized LinkedIn content performance expert focused on temporal engagement patterns,
26
+ content type effectiveness, and audience interaction.
27
+
28
+ Your role includes:
29
+
30
+ 1. ENGAGEMENT TREND ANALYSIS (monthly, using 'published_at'):
31
+ - Analyze trends for key engagement metrics: likes, comments, shares, overall engagement ('engagement' column), impressions, clicks.
32
+ - Calculate and analyze engagement rate (engagement / impressionCount) over time.
33
+ - Calculate and analyze click-through rate (CTR: clickCount / impressionCount) over time.
34
+ - Identify periods of high/low engagement and potential drivers.
35
+
36
+ 2. CONTENT TYPE & TOPIC PERFORMANCE:
37
+ - Compare performance across different media types (using 'media_type' column).
38
+ - Analyze performance by content topic/pillar (using 'li_eb_label' column).
39
+ - Identify which content types/topics drive the most engagement, impressions, or clicks.
40
+ - Analyze if the effectiveness of certain media types or topics changes over time.
41
+
42
+ 3. POSTING BEHAVIOR ANALYSIS:
43
+ - Analyze posting frequency (e.g., posts per week/month) and its potential impact on overall engagement or reach.
44
+ - Identify if there are optimal posting times or days based on engagement patterns (if 'published_at' provides time detail).
45
+
46
+ 4. SENTIMENT ANALYSIS (if 'sentiment' column is available):
47
+ - Analyze the distribution of sentiment (e.g., positive, negative, neutral) associated with posts.
48
+ - Track how average sentiment of posts evolves over time.
49
+
50
+ 5. AD PERFORMANCE (if 'is_ad' column is available):
51
+ - Compare performance (engagement, reach, CTR) of ad posts vs. organic posts.
52
+
53
+ 6. METRIC EXTRACTION (for AgentMetrics):
54
+ - Extract time-series data for average monthly engagement metrics (likes, comments, engagement rate, CTR, etc.).
55
+ - Provide aggregate performance metrics (e.g., overall average engagement rate, total impressions, top performing media type).
56
+ - Include distributions for content types, topics, and sentiment as categorical metrics.
57
+
58
+ Focus on actionable insights. What content resonates most? When is the audience most active? How can strategy be improved?
59
+ Structure your analysis clearly. Use the provided DataFrame columns ('published_at', 'media_type', 'li_eb_label',
60
+ 'likeCount', 'commentCount', 'shareCount', 'engagement', 'impressionCount', 'clickCount', 'sentiment', 'is_ad').
61
+ """
62
+
63
+ def __init__(self, api_key: str, model_name: Optional[str] = None):
64
+ self.api_key = api_key
65
+ self.model_name = model_name or DEFAULT_AGENT_MODEL
66
+ self.agent = LlmAgent(
67
+ name=self.AGENT_NAME,
68
+ model=self.model_name,
69
+ description=self.AGENT_DESCRIPTION,
70
+ instruction=self.AGENT_INSTRUCTION
71
+ )
72
+ self.retry_mechanism = RetryMechanism()
73
+ logger.info(f"{self.AGENT_NAME} initialized with model {self.model_name}.")
74
+
75
+ def _preprocess_post_data(self, df: pd.DataFrame) -> pd.DataFrame:
76
+ """Cleans and prepares post data for analysis."""
77
+ if df is None or df.empty:
78
+ return pd.DataFrame()
79
+
80
+ df_processed = df.copy()
81
+
82
+ # Convert 'published_at' to datetime
83
+ if 'published_at' in df_processed.columns:
84
+ df_processed['published_at'] = pd.to_datetime(df_processed['published_at'], errors='coerce')
85
+ # df_processed.dropna(subset=['published_at'], inplace=True) # Keep rows even if date is NaT for other metrics
86
+ else:
87
+ logger.warning("'published_at' column not found. Time-series analysis will be limited.")
88
+ # Add a placeholder if critical for downstream, or handle absence gracefully
89
+ # df_processed['published_at'] = pd.NaT
90
+
91
+ # Ensure numeric types for engagement metrics, coercing errors and filling NaNs
92
+ metric_cols = ['likeCount', 'commentCount', 'shareCount', 'engagement', 'impressionCount', 'clickCount']
93
+ for col in metric_cols:
94
+ if col in df_processed.columns:
95
+ df_processed[col] = pd.to_numeric(df_processed[col], errors='coerce').fillna(0)
96
+ else:
97
+ logger.info(f"Metric column '{col}' not found in post data. Will be treated as 0.")
98
+ df_processed[col] = 0 # Add column with zeros if missing
99
+
100
+ # Calculate Engagement Rate and CTR where possible
101
+ if 'impressionCount' in df_processed.columns and 'engagement' in df_processed.columns:
102
+ df_processed['engagement_rate'] = df_processed.apply(
103
+ lambda row: (row['engagement'] / row['impressionCount']) if row['impressionCount'] > 0 else 0.0, axis=1
104
+ )
105
+ else:
106
+ df_processed['engagement_rate'] = 0.0
107
+
108
+ if 'impressionCount' in df_processed.columns and 'clickCount' in df_processed.columns:
109
+ df_processed['ctr'] = df_processed.apply(
110
+ lambda row: (row['clickCount'] / row['impressionCount']) if row['impressionCount'] > 0 else 0.0, axis=1
111
+ )
112
+ else:
113
+ df_processed['ctr'] = 0.0
114
+
115
+ # Handle 'is_ad' boolean conversion if it exists
116
+ if 'is_ad' in df_processed.columns:
117
+ df_processed['is_ad'] = df_processed['is_ad'].astype(bool)
118
+ else:
119
+ df_processed['is_ad'] = False # Assume organic if not specified
120
+
121
+ # Handle 'sentiment' - ensure it's string, fill NaNs
122
+ if 'sentiment' in df_processed.columns:
123
+ df_processed['sentiment'] = df_processed['sentiment'].astype(str).fillna('Unknown')
124
+ else:
125
+ df_processed['sentiment'] = 'Unknown'
126
+
127
+ # Handle 'media_type' and 'li_eb_label' - ensure string, fill NaNs
128
+ for col in ['media_type', 'li_eb_label']:
129
+ if col in df_processed.columns:
130
+ df_processed[col] = df_processed[col].astype(str).fillna('Unknown')
131
+ else:
132
+ df_processed[col] = 'Unknown'
133
+
134
+ return df_processed
135
+
136
+ def _extract_time_series_metrics(self, df_processed: pd.DataFrame) -> List[TimeSeriesMetric]:
137
+ """Extracts monthly time-series metrics from processed post data."""
138
+ ts_metrics = []
139
+ if df_processed.empty or 'published_at' not in df_processed.columns or df_processed['published_at'].isnull().all():
140
+ logger.info("Cannot extract time-series metrics for posts: 'published_at' is missing or all null.")
141
+ return ts_metrics
142
+
143
+ # Filter out rows where 'published_at' is NaT for time-series aggregation
144
+ df_ts = df_processed.dropna(subset=['published_at']).copy()
145
+ if df_ts.empty:
146
+ logger.info("No valid 'published_at' dates for post time-series metrics after filtering NaT.")
147
+ return ts_metrics
148
+
149
+ df_ts['year_month'] = df_ts['published_at'].dt.strftime('%Y-%m')
150
+
151
+ # Metrics to average monthly
152
+ metrics_to_agg = {
153
+ 'likeCount': 'mean', 'commentCount': 'mean', 'shareCount': 'mean',
154
+ 'engagement': 'mean', 'impressionCount': 'mean', 'clickCount': 'mean',
155
+ 'engagement_rate': 'mean', 'ctr': 'mean'
156
+ }
157
+ # Filter out metrics not present in the DataFrame
158
+ available_metrics_to_agg = {k: v for k, v in metrics_to_agg.items() if k in df_ts.columns}
159
+
160
+ if not available_metrics_to_agg:
161
+ logger.info("No standard engagement metric columns found for time-series aggregation.")
162
+ else:
163
+ monthly_stats = df_ts.groupby('year_month').agg(available_metrics_to_agg).reset_index()
164
+ timestamps = monthly_stats['year_month'].tolist()
165
+
166
+ for metric_col, agg_type in available_metrics_to_agg.items():
167
+ # Use original column name, or a more descriptive one like "avg_monthly_likes"
168
+ ts_metrics.append(TimeSeriesMetric(
169
+ metric_name=f"avg_monthly_{metric_col.lower()}",
170
+ values=monthly_stats[metric_col].fillna(0).tolist(),
171
+ timestamps=timestamps,
172
+ metric_type="time_series",
173
+ time_granularity="monthly",
174
+ unit="%" if "_rate" in metric_col or "ctr" in metric_col else "count"
175
+ ))
176
+
177
+ # Time series for sentiment distribution (count of posts by sentiment per month)
178
+ if 'sentiment' in df_ts.columns and df_ts['sentiment'].nunique() > 1 : # if sentiment data exists
179
+ # Ensure 'sentiment' is not all 'Unknown'
180
+ if not (df_ts['sentiment'] == 'Unknown').all():
181
+ sentiment_by_month = df_ts.groupby(['year_month', 'sentiment']).size().unstack(fill_value=0)
182
+ for sentiment_value in sentiment_by_month.columns:
183
+ if sentiment_value == 'Unknown' and (sentiment_by_month[sentiment_value] == 0).all():
184
+ continue # Skip if 'Unknown' sentiment has no posts
185
+ ts_metrics.append(TimeSeriesMetric(
186
+ metric_name=f"monthly_post_count_sentiment_{str(sentiment_value).lower().replace(' ', '_')}",
187
+ values=sentiment_by_month[sentiment_value].tolist(),
188
+ timestamps=sentiment_by_month.index.tolist(), # year_month is the index
189
+ metric_type="time_series",
190
+ time_granularity="monthly",
191
+ unit="count"
192
+ ))
193
+ else:
194
+ logger.info("Sentiment data is all 'Unknown', skipping sentiment time series.")
195
+
196
+ # Time series for post count
197
+ monthly_post_counts = df_ts.groupby('year_month').size().reset_index(name='post_count')
198
+ if not monthly_post_counts.empty:
199
+ ts_metrics.append(TimeSeriesMetric(
200
+ metric_name="monthly_post_count",
201
+ values=monthly_post_counts['post_count'].tolist(),
202
+ timestamps=monthly_post_counts['year_month'].tolist(),
203
+ metric_type="time_series",
204
+ time_granularity="monthly",
205
+ unit="count"
206
+ ))
207
+
208
+ return ts_metrics
209
+
210
+ def _calculate_aggregate_metrics(self, df_processed: pd.DataFrame) -> Dict[str, Any]:
211
+ """Calculates aggregate performance metrics for posts."""
212
+ agg_metrics = {}
213
+ if df_processed.empty:
214
+ return agg_metrics
215
+
216
+ # Overall averages and totals
217
+ metric_cols_for_agg = ['likeCount', 'commentCount', 'shareCount', 'engagement',
218
+ 'impressionCount', 'clickCount', 'engagement_rate', 'ctr']
219
+ for col in metric_cols_for_agg:
220
+ if col in df_processed.columns and pd.api.types.is_numeric_dtype(df_processed[col]):
221
+ agg_metrics[f'overall_avg_{col.lower()}'] = float(df_processed[col].mean())
222
+ if col not in ['engagement_rate', 'ctr']: # Totals make sense for counts
223
+ agg_metrics[f'overall_total_{col.lower()}'] = float(df_processed[col].sum())
224
+
225
+ agg_metrics['total_posts_analyzed'] = float(len(df_processed))
226
+
227
+ # Posting frequency (posts per week)
228
+ if 'published_at' in df_processed.columns and not df_processed['published_at'].isnull().all():
229
+ df_dated = df_processed.dropna(subset=['published_at']).sort_values('published_at')
230
+ if len(df_dated) > 1:
231
+ # Calculate total duration in days
232
+ duration_days = (df_dated['published_at'].max() - df_dated['published_at'].min()).days
233
+ if duration_days > 0:
234
+ agg_metrics['avg_posts_per_week'] = float(len(df_dated) / (duration_days / 7.0))
235
+ elif len(df_dated) > 0: # All posts on the same day or within a day
236
+ agg_metrics['avg_posts_per_week'] = float(len(df_dated) * 7) # Extrapolate
237
+ elif len(df_dated) == 1:
238
+ agg_metrics['avg_posts_per_week'] = 7.0 # One post, extrapolate to 7 per week
239
+
240
+ # Performance by media type and topic (as tables/structured dicts)
241
+ agg_metrics['performance_by_media_type'] = self._create_performance_table(df_processed, 'media_type')
242
+ agg_metrics['performance_by_topic'] = self._create_performance_table(df_processed, 'li_eb_label')
243
+
244
+ return agg_metrics
245
+
246
+ def _create_performance_table(self, df: pd.DataFrame, group_column: str) -> Dict[str, Any]:
247
+ """Helper to create a structured performance table for categorical analysis."""
248
+ if group_column not in df.columns or df[group_column].isnull().all() or (df[group_column] == 'Unknown').all():
249
+ return {"message": f"No data or only 'Unknown' values for grouping by {group_column}."}
250
+
251
+ # Filter out 'Unknown' category if it's the only one or for cleaner tables
252
+ df_filtered = df[df[group_column] != 'Unknown']
253
+ if df_filtered.empty: # If filtering 'Unknown' leaves no data, use original df but acknowledge
254
+ df_filtered = df
255
+ logger.info(f"Performance table for {group_column} includes 'Unknown' as it's the only/main category.")
256
+
257
+ # Define metrics to aggregate
258
+ agg_config = {
259
+ 'engagement': 'mean',
260
+ 'impressionCount': 'mean',
261
+ 'clickCount': 'mean',
262
+ 'likeCount': 'mean',
263
+ 'commentCount': 'mean',
264
+ 'shareCount': 'mean',
265
+ 'engagement_rate': 'mean',
266
+ 'ctr': 'mean',
267
+ 'published_at': 'count' # To get number of posts per category
268
+ }
269
+ # Filter config for columns that actually exist in df_filtered
270
+ valid_agg_config = {k: v for k, v in agg_config.items() if k in df_filtered.columns or k == 'published_at'} # 'published_at' for count
271
+
272
+ if not valid_agg_config or 'published_at' not in valid_agg_config : # Need at least one metric or count
273
+ return {"message": f"Not enough relevant metric columns to create performance table for {group_column}."}
274
+
275
+
276
+ try:
277
+ # Group by the specified column and aggregate
278
+ # Rename 'published_at' count to 'num_posts' for clarity
279
+ grouped = df_filtered.groupby(group_column).agg(valid_agg_config).rename(
280
+ columns={'published_at': 'num_posts'}
281
+ ).reset_index()
282
+
283
+ # Sort by a primary engagement metric, e.g., average engagement rate or num_posts
284
+ sort_key = 'num_posts'
285
+ if 'engagement_rate' in grouped.columns:
286
+ sort_key = 'engagement_rate'
287
+ elif 'engagement' in grouped.columns:
288
+ sort_key = 'engagement'
289
+
290
+ grouped = grouped.sort_values(by=sort_key, ascending=False)
291
+
292
+ # Prepare for JSON serializable output
293
+ table_data = []
294
+ for _, row in grouped.iterrows():
295
+ row_dict = {'category': row[group_column]}
296
+ for col in grouped.columns:
297
+ if col == group_column: continue
298
+ value = row[col]
299
+ if isinstance(value, (int, float)):
300
+ if "_rate" in col or "ctr" in col:
301
+ row_dict[col] = f"{value:.2%}" # Percentage
302
+ else:
303
+ row_dict[col] = round(value, 2) if isinstance(value, float) else value
304
+ else:
305
+ row_dict[col] = str(value)
306
+ table_data.append(row_dict)
307
+
308
+ return {
309
+ "grouping_column": group_column,
310
+ "columns_reported": [col for col in grouped.columns.tolist() if col != group_column],
311
+ "data": table_data,
312
+ "note": f"Top categories by {sort_key}."
313
+ }
314
+
315
+ except Exception as e:
316
+ logger.error(f"Error creating performance table for {group_column}: {e}", exc_info=True)
317
+ return {"error": f"Could not generate table for {group_column}: {e}"}
318
+
319
+
320
+ def _extract_categorical_metrics(self, df_processed: pd.DataFrame) -> Dict[str, Any]:
321
+ """Extracts distributions and other categorical insights for posts."""
322
+ cat_metrics = {}
323
+ if df_processed.empty:
324
+ return cat_metrics
325
+
326
+ # Media type distribution
327
+ if 'media_type' in df_processed.columns and df_processed['media_type'].nunique() > 0:
328
+ cat_metrics['media_type_distribution'] = df_processed['media_type'].value_counts(normalize=True).apply(lambda x: f"{x:.2%}").to_dict()
329
+ cat_metrics['media_type_counts'] = df_processed['media_type'].value_counts().to_dict()
330
+
331
+
332
+ # Topic distribution (li_eb_label)
333
+ if 'li_eb_label' in df_processed.columns and df_processed['li_eb_label'].nunique() > 0:
334
+ cat_metrics['topic_distribution'] = df_processed['li_eb_label'].value_counts(normalize=True).apply(lambda x: f"{x:.2%}").to_dict()
335
+ cat_metrics['topic_counts'] = df_processed['li_eb_label'].value_counts().to_dict()
336
+
337
+ # Sentiment distribution
338
+ if 'sentiment' in df_processed.columns and df_processed['sentiment'].nunique() > 0:
339
+ cat_metrics['sentiment_distribution'] = df_processed['sentiment'].value_counts(normalize=True).apply(lambda x: f"{x:.2%}").to_dict()
340
+ cat_metrics['sentiment_counts'] = df_processed['sentiment'].value_counts().to_dict()
341
+
342
+ # Ad vs. Organic performance summary
343
+ if 'is_ad' in df_processed.columns:
344
+ ad_summary = {}
345
+ for ad_status in [True, False]:
346
+ subset = df_processed[df_processed['is_ad'] == ad_status]
347
+ if not subset.empty:
348
+ label = "ad" if ad_status else "organic"
349
+ ad_summary[f'{label}_post_count'] = int(len(subset))
350
+ ad_summary[f'{label}_avg_engagement_rate'] = float(subset['engagement_rate'].mean())
351
+ ad_summary[f'{label}_avg_impressions'] = float(subset['impressionCount'].mean())
352
+ ad_summary[f'{label}_avg_ctr'] = float(subset['ctr'].mean())
353
+ if ad_summary:
354
+ cat_metrics['ad_vs_organic_summary'] = ad_summary
355
+
356
+ return cat_metrics
357
+
358
+ def _extract_time_periods(self, df_processed: pd.DataFrame) -> List[str]:
359
+ """Extracts unique year-month time periods covered by the post data."""
360
+ if df_processed.empty or 'published_at' not in df_processed.columns or df_processed['published_at'].isnull().all():
361
+ return ["Data period not available or N/A"]
362
+
363
+ # Use already created 'year_month' if available from preprocessing, or derive it
364
+ if 'year_month' in df_processed.columns:
365
+ periods = sorted(df_processed['year_month'].dropna().unique().tolist(), reverse=True)
366
+ elif 'published_at' in df_processed.columns: # Derive if not present
367
+ dates = df_processed['published_at'].dropna()
368
+ if not dates.empty:
369
+ periods = sorted(dates.dt.strftime('%Y-%m').unique().tolist(), reverse=True)
370
+ else: return ["N/A"]
371
+ else: return ["N/A"]
372
+
373
+ return periods[:12] # Return up to the last 12 months
374
+
375
+ def analyze_post_data(self, post_df: pd.DataFrame) -> AgentMetrics:
376
+ """
377
+ Generates comprehensive post performance analysis.
378
+ """
379
+ if post_df is None or post_df.empty:
380
+ logger.warning("Post DataFrame is empty. Returning empty metrics.")
381
+ return AgentMetrics(
382
+ agent_name=self.AGENT_NAME,
383
+ analysis_summary="No post data provided for analysis.",
384
+ time_periods_covered=["N/A"]
385
+ )
386
+
387
+ # 1. Preprocess data
388
+ df_processed = self._preprocess_post_data(post_df)
389
+ if df_processed.empty and not post_df.empty : # Preprocessing resulted in empty df
390
+ logger.warning("Post DataFrame became empty after preprocessing. Original data might have issues.")
391
+ return AgentMetrics(
392
+ agent_name=self.AGENT_NAME,
393
+ analysis_summary="Post data could not be processed (e.g., all dates invalid).",
394
+ time_periods_covered=["N/A"]
395
+ )
396
+ elif df_processed.empty and post_df.empty: # Was already empty
397
+ # This case is handled by the initial check, but as a safeguard:
398
+ return AgentMetrics(agent_name=self.AGENT_NAME, analysis_summary="No post data provided.")
399
+
400
+
401
+ # 2. Generate textual analysis using PandasAI (similar to follower agent)
402
+ df_description_for_pandasai = "LinkedIn post performance data. Key columns: 'published_at' (date of post), 'media_type' (e.g., IMAGE, VIDEO, ARTICLE), 'li_eb_label' (content topic/pillar), 'likeCount', 'commentCount', 'shareCount', 'engagement' (sum of reactions, comments, shares), 'impressionCount', 'clickCount', 'sentiment' (post sentiment), 'is_ad' (boolean), 'engagement_rate', 'ctr'."
403
+
404
+ analysis_result_text = "PandasAI analysis for posts could not be performed."
405
+ try:
406
+ # Ensure PandasAI is configured
407
+ pandas_ai_df = pai.DataFrame(df_processed, description=df_description_for_pandasai)
408
+
409
+ analysis_query = f"""
410
+ Analyze the provided LinkedIn post performance data. Focus on:
411
+ 1. Monthly trends for key metrics (engagement, impressions, engagement rate, CTR).
412
+ 2. Performance comparison by 'media_type' and 'li_eb_label'. Which ones are most effective?
413
+ 3. Impact of posting frequency (if derivable from 'published_at' timestamps).
414
+ 4. Sentiment trends and distribution.
415
+ 5. Differences in performance between ad posts ('is_ad'=True) and organic posts.
416
+ Provide a concise summary of findings and actionable recommendations.
417
+ """
418
+ def chat_operation():
419
+ if not pai.config.llm:
420
+ logger.warning("PandasAI LLM not configured for post agent. Attempting to configure.")
421
+ from utils.pandasai_setup import configure_pandasai
422
+ configure_pandasai(self.api_key, self.model_name)
423
+ if not pai.config.llm:
424
+ raise RuntimeError("PandasAI LLM could not be configured for post chat operation.")
425
+ logger.info(f"Executing PandasAI chat for post analysis with LLM: {pai.config.llm}")
426
+ return pandas_ai_df.chat(analysis_query)
427
+
428
+ analysis_result_raw = self.retry_mechanism.retry_with_backoff(
429
+ func=chat_operation, max_retries=2, base_delay=2.0, exceptions=(Exception,)
430
+ )
431
+ analysis_result_text = str(analysis_result_raw) if analysis_result_raw else "No textual analysis for posts generated by PandasAI."
432
+ logger.info("Post performance analysis via PandasAI completed.")
433
+
434
+ except Exception as e:
435
+ logger.error(f"Post analysis with PandasAI failed: {e}", exc_info=True)
436
+ analysis_result_text = f"Post analysis using PandasAI failed. Error: {str(e)[:200]}"
437
+
438
+ # 3. Extract structured metrics
439
+ time_series_metrics = self._extract_time_series_metrics(df_processed)
440
+ aggregate_metrics = self._calculate_aggregate_metrics(df_processed)
441
+ categorical_metrics = self._extract_categorical_metrics(df_processed)
442
+ time_periods = self._extract_time_periods(df_processed)
443
+
444
+ return AgentMetrics(
445
+ agent_name=self.AGENT_NAME,
446
+ analysis_summary=analysis_result_text[:2000],
447
+ time_series_metrics=time_series_metrics,
448
+ aggregate_metrics=aggregate_metrics,
449
+ categorical_metrics=categorical_metrics,
450
+ time_periods_covered=time_periods,
451
+ data_sources_used=[f"post_df (shape: {post_df.shape}) -> df_processed (shape: {df_processed.shape})"]
452
+ )
453
+
454
+ if __name__ == '__main__':
455
+ try:
456
+ from utils.logging_config import setup_logging
457
+ setup_logging()
458
+ logger.info("Logging setup for EnhancedPostPerformanceAgent test.")
459
+ except ImportError:
460
+ logging.basicConfig(level=logging.INFO)
461
+ logger.warning("Could not import setup_logging. Using basicConfig.")
462
+
463
+ MOCK_API_KEY = os.environ.get("GOOGLE_API_KEY", "test_api_key_posts")
464
+ MODEL_NAME = DEFAULT_AGENT_MODEL
465
+
466
+ try:
467
+ from utils.pandasai_setup import configure_pandasai
468
+ if MOCK_API_KEY != "test_api_key_posts":
469
+ configure_pandasai(MOCK_API_KEY, MODEL_NAME)
470
+ logger.info("PandasAI configured for testing EnhancedPostPerformanceAgent.")
471
+ else:
472
+ logger.warning("Using mock API key for posts. PandasAI chat will likely fail or use a mock.")
473
+ class MockPandasAIDataFrame:
474
+ def __init__(self, df, description): self.df = df; self.description = description
475
+ def chat(self, query): return f"Mock PandasAI post response to: {query}"
476
+ pai.DataFrame = MockPandasAIDataFrame
477
+ except ImportError:
478
+ logger.error("utils.pandasai_setup not found. PandasAI will not be configured for posts.")
479
+ class MockPandasAIDataFrame:
480
+ def __init__(self, df, description): self.df = df; self.description = description
481
+ def chat(self, query): return f"Mock PandasAI post response to: {query}"
482
+ pai.DataFrame = MockPandasAIDataFrame
483
+
484
+ sample_post_data = {
485
+ 'published_at': pd.to_datetime(['2023-01-15', '2023-01-20', '2023-02-10', '2023-02-25', '2023-03-05', None]),
486
+ 'media_type': ['IMAGE', 'VIDEO', 'IMAGE', 'ARTICLE', 'IMAGE', 'IMAGE'],
487
+ 'li_eb_label': ['Product Update', 'Company Culture', 'Product Update', 'Industry Insights', 'Company Culture', 'Product Update'],
488
+ 'likeCount': [100, 150, 120, 80, 200, 50],
489
+ 'commentCount': [10, 20, 15, 5, 25, 3],
490
+ 'shareCount': [5, 10, 8, 2, 12, 1],
491
+ 'engagement': [115, 180, 143, 87, 237, 54], # Sum of likes, comments, shares
492
+ 'impressionCount': [1000, 1500, 1200, 900, 2000, 600],
493
+ 'clickCount': [50, 70, 60, 30, 90, 20],
494
+ 'sentiment': ['Positive πŸ‘', 'Positive πŸ‘', 'Neutral 😐', 'Positive πŸ‘', 'Negative πŸ‘Ž', 'Positive πŸ‘'],
495
+ 'is_ad': [False, False, True, False, False, True]
496
+ }
497
+ sample_df_posts = pd.DataFrame(sample_post_data)
498
+
499
+ post_agent = EnhancedPostPerformanceAgent(api_key=MOCK_API_KEY, model_name=MODEL_NAME)
500
+
501
+ logger.info("Analyzing sample post data...")
502
+ post_metrics_result = post_agent.analyze_post_data(sample_df_posts)
503
+
504
+ print("\n--- EnhancedPostPerformanceAgent Results ---")
505
+ print(f"Agent Name: {post_metrics_result.agent_name}")
506
+ print(f"Analysis Summary: {post_metrics_result.analysis_summary}")
507
+ print("\nTime Series Metrics (Post):")
508
+ for ts_metric in post_metrics_result.time_series_metrics:
509
+ print(f" - {ts_metric.metric_name}: {len(ts_metric.values)} data points, e.g., {ts_metric.values[:3]} for ts {ts_metric.timestamps[:3]} (Unit: {ts_metric.unit})")
510
+ print("\nAggregate Metrics (Post):")
511
+ for key, value in post_metrics_result.aggregate_metrics.items():
512
+ if isinstance(value, dict) and 'data' in value: # Performance table
513
+ print(f" - {key}: (Table - {value.get('grouping_column', '')}) - {len(value['data'])} categories")
514
+ for item in value['data'][:1]: # Print first item for brevity
515
+ print(f" Example Category '{item.get('category')}': { {k:v for k,v in item.items() if k!='category'} }")
516
+ else:
517
+ print(f" - {key}: {value}")
518
+ print("\nCategorical Metrics (Post):")
519
+ for key, value in post_metrics_result.categorical_metrics.items():
520
+ print(f" - {key}:")
521
+ if isinstance(value, dict):
522
+ for sub_key, sub_value in list(value.items())[:2]:
523
+ print(f" - {sub_key}: {sub_value}")
524
+ else:
525
+ print(f" {value}")
526
+ print(f"\nTime Periods Covered (Post): {post_metrics_result.time_periods_covered}")
527
+
528
+ # Test with empty DataFrame
529
+ logger.info("\n--- Testing Post Agent with empty DataFrame ---")
530
+ empty_post_metrics = post_agent.analyze_post_data(pd.DataFrame())
531
+ print(f"Empty Post DF Analysis Summary: {empty_post_metrics.analysis_summary}")