Analysis on my ChatGPT data

did an initial analysis of my ChatGPT history export. Here's surface-level, first-pass analysis:

Content Types

Most messages are textwith assistant messages also including codethinkingand reasoning_recapUser messages are primarily textwith small amount of multimodal_text.

Conversation Length

Some conversations are significantly longer (e.g., "Genkit to Gemini Refactor" has the most messages).

Message Length (Word Count)

  • Assistant messages: ~236 words on average.

  • User messages: ~222 words on average, but with high variabilityincluding some very long inputs.

python
df = df.withColumn("word_count", size(split(col("content_text"), r"\s+"))) df_user = df_user.withColumn("word_count", size(split(col("content_text"), r"\s+"))) df.select("word_count").describe().show() df_user.select("word_count").describe().show()

Temporal Trends (Monthly Aggregations)

The data spans Feb 2023 to May 2025tracked metrics like average words per conversationmessages per monthTTRand subjectivity/objectivity over this period, revealing fluctuations and trends in language style.

python
monthly = df.withColumn("month", date_format(col("create_time"), "yyyy-MM")) \ .groupBy("conversation_title", "month") \ .agg( sum("word_count").alias("word_count_per_conversation"), avg("word_count").alias("avg_word_count"), count("*").alias("message_count"), ) \ .orderBy("conversation_title", "month") monthly.show(n=5)

User messages show varying monthly averages for word countTTRand subjectivity.

Next Steps

The goal is to track the quality of GPT responses over time, which haven't achieved yet. To do this, plan to use LLM-based evaluation for quality assessment.

This analysis provided baseline for understanding message patterns and trends, but further refinement is needed to track the quality of assistant responses effectively.

No comments:

Post a Comment

"A Name, an Address, a Route" Haiku — Found in RFC 791: DARPA’s 1981 Internet Protocol

A name indicates what we seek.   An address indicates where it is.   A route indicates how to get there.   The internet protocol deals prima...