Why brand mentions make good network data
A tweet mentioning a brand is not just an opinion — it is a signal in a conversation. When user A tweets at brand B, and user C retweets that, and brand B replies to user D, a structure emerges: clusters of engaged users, bridges between communities, and hubs that disproportionately amplify messages. Standard sentiment analysis misses all of this because it treats each tweet as an isolated data point.
I wanted to understand the shape of brand conversations, not just their sentiment. So I built directed mention graphs for Nike, Adidas, and Lululemon using over 150,000 tweets — and what came back was a clear picture of three very different community architectures.
The dataset and graph construction
The dataset comprised 150k+ tweets collected via the Twitter API, filtered to posts that mentioned at least one of the three brand accounts. Each tweet was parsed to extract directed edges: if user A mentioned user B in a tweet, that created an edge A → B in the graph. Brand accounts themselves were included as nodes so that direct brand engagement would be visible in the topology.
I used Python's NetworkX library to construct and analyze the graphs, computing degree centrality, betweenness centrality, and clustering coefficients for each brand network. For visualization I used Altair layered with custom coordinate layouts to make community structures legible at scale.
In parallel, I built a semantic co-occurrence network from the same corpus: two words are connected if they appear together in tweets above a frequency threshold, weighted by pointwise mutual information (PMI). This surface what language and themes cluster around each brand beyond just the user graph.
What the graphs revealed
Largest node count and highest total mention volume by a wide margin. The graph is broad and diffuse — high reach, lower clustering.
Fewer total mentions but significantly higher clustering coefficient. Users mention each other, not just the brand — a sign of genuine community identity.
The most semantically concentrated community. Co-occurrence analysis showed tight clustering around wellness, yoga, and athleisure themes — a narrow but loyal audience.
The difference between Nike and Adidas was particularly striking. Nike's graph resembled a broadcast network — many users mentioning the brand, fewer mentioning each other. Adidas had a fundamentally different topology: denser subgraphs, higher reciprocity, and more user-to-user edges relative to user-to-brand edges. This pattern is associated with community-driven engagement rather than brand-driven amplification.
Semantic networks: what language surrounds each brand
The word co-occurrence networks reinforced the graph findings. Nike's semantic network was broad and heterogeneous — sneakers, sports, athletes, culture, releases — reflecting its position as a mass-market brand spanning many contexts. Adidas clustered more tightly around sport-specific and subcultural terms. Lululemon's semantic space was the most compact: dominated by wellness and lifestyle vocabulary with minimal overlap with competitive brands.
This has practical implications for content strategy. Brands with diffuse semantic networks face a harder task maintaining consistent identity; brands with tight semantic clusters have stronger but narrower positioning to defend.
What I would do differently
The main limitation of this analysis is temporal flatness — the full dataset is treated as a single snapshot, which obscures how community structure evolves around events like product launches, controversies, or athlete endorsements. A time-sliced graph, comparing topology before and after a specific event, would make the analysis much more actionable.
I would also incorporate account-level metadata (follower count, verified status, account age) to weight nodes by influence rather than treating all users as equivalent. High-follower accounts act as structural hubs, and their centrality in the graph carries more signal than that of low-follower users.