The Lifeblood of Modern Communication

3.1.1 Introduction & Importance

  • Ubiquity: Email is the most common internet activity (90% of users). Unlike many social media tools, it spans all demographic groups, including older adults.

  • The “Mirror” Effect: Analyzing personal email is like looking in a mirror; it reveals actual social hygiene and work relationships, overcoming memory biases.

  • Organizational Value: Used to identify internal influencers, “silos” (disconnected groups), and bridge-spanners. It is based on actual behavior rather than often-biased self-reports.


3.1.2 Technical Characteristics of Email

  • History: Older than the internet. Ray Tomlinson introduced the ”@” symbol in 1971 to separate user and host.

  • Key Features:

    • Flexible Form: Supports plain text, rich newsletters, and attachments.

    • Asynchronous: No real-time interruption; encourages thoughtful responses.

    • Broadcast: Allows ad-hoc grouping and large-scale communication (Listservs).

    • Push Technology: Senders determine what arrives in the receiver’s inbox.

    • Threaded Conversation: Grouped collections of messages and replies mimic natural turn-taking.


3.1.3 Email Networks

  • Vertices (Nodes): Represent unique email addresses or people.

  • Edges (Ties): Directed lines representing messages sent from one person to another.

  • Weight: Typically based on the total volume of messages exchanged.

  • Aggregation: Networks can be “rolled up” to show connections between entire departments or organizational units rather than individuals.


3.1.4 Analytical Questions by Category

CategoryKey Focus Questions
PersonalWho are my boundary spanners? Which projects have I neglected? What are my natural subgroups?
OrganizationalWho are the influencers/experts? How does the “real” communication flow differ from the formal org-chart?
CommunityWho are the leaders in a public forum? Is the community healthy and growing?

3.1.5 - 3.1.6 Data Preparation & Cleaning

  • Protocols: SMTP (sending), POP/IMAP (retrieving).

  • Entity Resolution: The process of matching multiple email addresses (aliases) to a single person. This is a major challenge in email analysis.

  • Data Cleaning in NodeXL:

    1. Remove Aliases: Using Find/Replace or a Lookup table to ensure “john.doe@company.com” and “jdoe@gmail.com” are mapped to the same vertex.

    2. Merge Duplicate Edges: Rolling up multiple individual messages into a single weighted edge.

    3. Remove Self-Loops: Deleting instances where people email themselves (unless they represent “reminders” relevant to the study).


3.1.7 Case Study: Personal Email Analysis

  • Overview Visualization: A snapshot of a month’s activity. To reduce clutter, analysts often filter out infrequent ties (e.g., edges with a weight < 5).

  • Expertise Networks: Created by filtering the corpus for specific keywords (e.g., “NodeXL” or “Research”). This reveals who talks most about a specific topic.

  • Common Patterns: “Fan” structures (hubs sending many messages but receiving few) and densely connected cliques (team collaborations).


3.1.8 Case Study: Organizational “Living Org-Charts”

  • Normalizing Data: In large companies like “TechABC,” absolute message counts are misleading. Analysts use Messages per FTE (Full-Time Equivalent) to compare small and large units fairly.

  • Bridge Spanners: Small units that link large, separate divisions are critical; they are often vulnerable to employee turnover.

  • Research Division Example: Analysis showed that while many research units exist, they are often disconnected from one another, suggesting a lack of interdisciplinary collaboration.


3.1.9 Case Study: Historical/Legal (Enron)

  • The Dataset: ~500,000 emails made public after the Enron investigation.

  • Content Networks: Using keywords like “FERC” (Federal Energy Regulatory Commission) to find relevant clusters.

  • Identifying Key Players: High In-Degree (receiving many messages) can signal a central “go-to” person or, in legal contexts, a mastermind.

  • Finding Violators: In the Enron data, Tim Belden (a key witness/conspirator) was identified by his unique position in the “FERC” communication network.


3.1.10 - 3.1.11 Practitioner & Researcher Agenda

  • Ethics: Monitoring email is legally permissible for most companies but requires transparency to avoid “Big Brother” perceptions.

  • Future Research: * Forensic Tools: Improving how we find documents in massive archives.

    • Temporal Changes: Detecting sharp increases/decreases in communication that signal major events (rumors, reorganizations, or crises).

    • Geo-location: Tying email to physical office positions to understand social flow.