The Lifeblood of Modern Communication
3.1.1 Introduction & Importance
-
Ubiquity: Email is the most common internet activity (90% of users). Unlike many social media tools, it spans all demographic groups, including older adults.
-
The “Mirror” Effect: Analyzing personal email is like looking in a mirror; it reveals actual social hygiene and work relationships, overcoming memory biases.
-
Organizational Value: Used to identify internal influencers, “silos” (disconnected groups), and bridge-spanners. It is based on actual behavior rather than often-biased self-reports.
3.1.2 Technical Characteristics of Email
-
History: Older than the internet. Ray Tomlinson introduced the ”@” symbol in 1971 to separate user and host.
-
Key Features:
-
Flexible Form: Supports plain text, rich newsletters, and attachments.
-
Asynchronous: No real-time interruption; encourages thoughtful responses.
-
Broadcast: Allows ad-hoc grouping and large-scale communication (Listservs).
-
Push Technology: Senders determine what arrives in the receiver’s inbox.
-
Threaded Conversation: Grouped collections of messages and replies mimic natural turn-taking.
-
3.1.3 Email Networks
-
Vertices (Nodes): Represent unique email addresses or people.
-
Edges (Ties): Directed lines representing messages sent from one person to another.
-
Weight: Typically based on the total volume of messages exchanged.
-
Aggregation: Networks can be “rolled up” to show connections between entire departments or organizational units rather than individuals.
3.1.4 Analytical Questions by Category
| Category | Key Focus Questions |
|---|---|
| Personal | Who are my boundary spanners? Which projects have I neglected? What are my natural subgroups? |
| Organizational | Who are the influencers/experts? How does the “real” communication flow differ from the formal org-chart? |
| Community | Who are the leaders in a public forum? Is the community healthy and growing? |
3.1.5 - 3.1.6 Data Preparation & Cleaning
-
Protocols: SMTP (sending), POP/IMAP (retrieving).
-
Entity Resolution: The process of matching multiple email addresses (aliases) to a single person. This is a major challenge in email analysis.
-
Data Cleaning in NodeXL:
-
Remove Aliases: Using Find/Replace or a Lookup table to ensure “john.doe@company.com” and “jdoe@gmail.com” are mapped to the same vertex.
-
Merge Duplicate Edges: Rolling up multiple individual messages into a single weighted edge.
-
Remove Self-Loops: Deleting instances where people email themselves (unless they represent “reminders” relevant to the study).
-
3.1.7 Case Study: Personal Email Analysis
-
Overview Visualization: A snapshot of a month’s activity. To reduce clutter, analysts often filter out infrequent ties (e.g., edges with a weight < 5).
-
Expertise Networks: Created by filtering the corpus for specific keywords (e.g., “NodeXL” or “Research”). This reveals who talks most about a specific topic.
-
Common Patterns: “Fan” structures (hubs sending many messages but receiving few) and densely connected cliques (team collaborations).
3.1.8 Case Study: Organizational “Living Org-Charts”
-
Normalizing Data: In large companies like “TechABC,” absolute message counts are misleading. Analysts use Messages per FTE (Full-Time Equivalent) to compare small and large units fairly.
-
Bridge Spanners: Small units that link large, separate divisions are critical; they are often vulnerable to employee turnover.
-
Research Division Example: Analysis showed that while many research units exist, they are often disconnected from one another, suggesting a lack of interdisciplinary collaboration.
3.1.9 Case Study: Historical/Legal (Enron)
-
The Dataset: ~500,000 emails made public after the Enron investigation.
-
Content Networks: Using keywords like “FERC” (Federal Energy Regulatory Commission) to find relevant clusters.
-
Identifying Key Players: High In-Degree (receiving many messages) can signal a central “go-to” person or, in legal contexts, a mastermind.
-
Finding Violators: In the Enron data, Tim Belden (a key witness/conspirator) was identified by his unique position in the “FERC” communication network.
3.1.10 - 3.1.11 Practitioner & Researcher Agenda
-
Ethics: Monitoring email is legally permissible for most companies but requires transparency to avoid “Big Brother” perceptions.
-
Future Research: * Forensic Tools: Improving how we find documents in massive archives.
-
Temporal Changes: Detecting sharp increases/decreases in communication that signal major events (rumors, reorganizations, or crises).
-
Geo-location: Tying email to physical office positions to understand social flow.
-