I did an initial analysis of my ChatGPT history export. Here's a surface-level, first-pass analysis:
Content Types
Most messages are text, with assistant messages also including code, thinking, and reasoning_recap. User messages are primarily text, with a small amount of multimodal_text.
Conversation Length
Some conversations are significantly longer (e.g., "Genkit to Gemini Refactor" has the most messages).
Message Length (Word Count)
Assistant messages: ~236 words on average.
User messages: ~222 words on average, but with high variability, including some very long inputs.
Temporal Trends (Monthly Aggregations)
The data spans Feb 2023 to May 2025. I tracked metrics like average words per conversation, messages per month, TTR, and subjectivity/objectivity over this period, revealing fluctuations and trends in language style.
User messages show varying monthly averages for word count, TTR, and subjectivity.
Next Steps
The goal is to track the quality of GPT responses over time, which I haven't achieved yet. To do this, I plan to use LLM-based evaluation for quality assessment.
This analysis provided a baseline for understanding message patterns and trends, but further refinement is needed to track the quality of assistant responses effectively.
No comments:
Post a Comment