Ernesto Herrera Salinas

Posted on Jan 28, 2025 • Edited on Jan 30, 2025

The Lazy Engineer’s Guide to Automating Timesheets: Part 1

#python #automation #git #csv

Ah, timesheets. The Lex Luthor of every software engineer’s existence. If you’re like me, you’d rather debug a race condition at 3 a.m. than spend even five minutes documenting what you did all day. But alas, as freelancers or full-time employees, we’re often stuck with this tedious chore.

This year, I hit my breaking point. After a whirlwind of projects—some canceled, some pivoted, and others shelved indefinitely—I found myself staring down the barrel of a year-end timesheet deadline. The thought of manually reconstructing my entire year’s work made me want to scream. So, I did what any self-respecting engineer would do: I decided to automate the problem away.

This is the story of how I turned a day of dread into a coding adventure. Buckle up—it’s going to be a lazy, efficient ride.

The Problem: Timesheets Are the Worst

Let’s set the scene:

The Task: Document every hour spent on every task for the entire year.
The Obstacle: My memory is about as reliable as a flaky integration test.
The Deadline: One day. Yes, one day.

I knew I couldn’t do this manually. So, I hatched a plan to pull data from all the tools I use daily—JIRA, Git, Slack, and Outlook—and stitch it together into a coherent timesheet.

The Tools of the Trade

Here’s what I used:

JIRA: For tracking tasks and tickets.
Git: For commit history (because every good engineer ties commits to tickets, right?).
Slack: For team communication (because meetings and messages count too).
Outlook: For calendar events (because apparently, meetings are work too).

Step 1: Pulling JIRA Tickets

My first stop was JIRA. I needed to pull all the tickets assigned to me within a specific date range. Thankfully, JIRA has a robust API, and with a little Python magic, I was able to automate this process.

The Script

Here’s the Python script I wrote to fetch JIRA tickets:

import os
from jira import JIRA
import pandas as pd
from datetime import datetime
import logging
import sys
from typing import List, Dict, Any
import argparse

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class JiraTicketPuller:
    def __init__(self, server: str, email: str, api_token: str):
        """
        Initialize JIRA client with authentication credentials.

        Args:
            server: JIRA server URL
            email: User's email for authentication
            api_token: JIRA API token
        """
        try:
            self.jira = JIRA(server=server, basic_auth=(email, api_token))
            logger.info("Successfully connected to JIRA")
        except Exception as e:
            logger.error(f"Failed to connect to JIRA: {str(e)}")
            sys.exit(1)

    def get_user_tickets(
        self, 
        username: str, 
        start_date: str, 
        end_date: str
    ) -> List[Dict[str, Any]]:
        """
        Retrieve all tickets assigned to a user within a date range.

        Args:
            username: JIRA username to search for
            start_date: Start date in YYYY-MM-DD format
            end_date: End date in YYYY-MM-DD format

        Returns:
            List of dictionaries containing ticket information
        """
        try:
            # Construct JQL query
            jql_query = (
                f'assignee = {username} AND '
                f'created >= {start_date} AND '
                f'created <= {end_date} '
                'ORDER BY created DESC'
            )

            # Fields to retrieve
            fields = [
                'summary',
                'status',
                'priority',
                'issuetype',
                'created',
                'updated',
                'project',
                'components',
                'description'
            ]

            # Get issues using JQL
            issues = self.jira.search_issues(
                jql_query,
                maxResults=1000,  # Adjust based on your needs
                fields=fields
            )

            tickets_data = []
            for issue in issues:
                ticket = {
                    'Key': issue.key,
                    'Summary': issue.fields.summary,
                    'Status': str(issue.fields.status),
                    'Priority': str(issue.fields.priority),
                    'Issue Type': str(issue.fields.issuetype),
                    'Created': issue.fields.created[:10],  # Get date only
                    'Updated': issue.fields.updated[:10],  # Get date only
                    'Project': str(issue.fields.project),
                    'Components': ', '.join([str(c) for c in issue.fields.components]),
                    'Description': issue.fields.description or ''
                }
                tickets_data.append(ticket)

            logger.info(f"Retrieved {len(tickets_data)} tickets for user {username}")
            return tickets_data

        except Exception as e:
            logger.error(f"Error retrieving tickets: {str(e)}")
            return []

    def export_to_csv(self, tickets: List[Dict[str, Any]], output_file: str):
        """
        Export tickets data to CSV file.

        Args:
            tickets: List of ticket dictionaries
            output_file: Path to output CSV file
        """
        try:
            if not tickets:
                logger.warning("No tickets to export")
                return

            df = pd.DataFrame(tickets)
            df.to_csv(output_file, index=False, encoding='utf-8')
            logger.info(f"Successfully exported tickets to {output_file}")

        except Exception as e:
            logger.error(f"Error exporting to CSV: {str(e)}")

def validate_date(date_str: str) -> bool:
    """Validate date string format (YYYY-MM-DD)."""
    try:
        datetime.strptime(date_str, '%Y-%m-%d')
        return True
    except ValueError:
        return False

def main():
    # Set up argument parser
    parser = argparse.ArgumentParser(
        description='Pull JIRA tickets for a user within a date range'
    )
    parser.add_argument('--username', required=True, help='JIRA username')
    parser.add_argument('--start-date', required=True, help='Start date (YYYY-MM-DD)')
    parser.add_argument('--end-date', required=True, help='End date (YYYY-MM-DD)')
    parser.add_argument(
        '--output', 
        default='jira_tickets.csv', 
        help='Output CSV file path'
    )

    args = parser.parse_args()

    # Validate dates
    if not all(validate_date(date) for date in [args.start_date, args.end_date]):
        logger.error("Invalid date format. Please use YYYY-MM-DD")
        sys.exit(1)

    # Configuration
    JIRA_SERVER = os.getenv("JIRA_SERVER")
    JIRA_EMAIL = os.getenv("JIRA_EMAIL")
    JIRA_API_TOKEN = os.getenv("JIRA_API_TOKEN")

    # Initialize ticket puller
    puller = JiraTicketPuller(JIRA_SERVER, JIRA_EMAIL, JIRA_API_TOKEN)

    # Get and export tickets
    tickets = puller.get_user_tickets(
        args.username,
        args.start_date,
        args.end_date
    )
    puller.export_to_csv(tickets, args.output)

if __name__ == "__main__":
    # Ensure JIRA API token is set
    if not os.getenv("JIRA_API_TOKEN"):
        logging.error("JIRA_API_TOKEN environment variable not set")
        return
    main()

How It Works

Authentication: The script uses your JIRA email and API token to authenticate.
JQL Query: It constructs a JQL query to fetch tickets assigned to you within a date range.
Data Export: The results are exported to a CSV file for easy analysis.

Step 2: Fetching Git Commits

Next, I turned to Git. Since our team follows a practice of including JIRA ticket IDs in commit messages, I wrote a script to extract commit data and map it to tickets.

The Script

import subprocess
import csv
import re

def get_git_commits(since_date=None, author=None):
    # Get commit data with full commit message
    cmd = ['git', 'log', '--pretty=format:%h|%ad|%s', '--date=iso']

    if since_date:
        cmd.extend(['--since', since_date])
    if author:
        cmd.extend(['--author', author])

    result = subprocess.run(cmd, capture_output=True, text=True)
    commits = result.stdout.strip().split('\n')

    # Process commits to extract JIRA IDs
    processed_commits = []
    jira_pattern = re.compile(r'JIRA:\s*([A-Z]+-\d+)')

    for commit in commits:
        if commit:  # Skip empty lines
            hash_id, date, message = commit.split('|', 2)
            jira_match = jira_pattern.search(message)
            jira_id = jira_match.group(1) if jira_match else ''

            processed_commits.append({
                'date': date,
                'commit': hash_id,
                'jira_id': jira_id
            })

    # Write to CSV
    csv_filename = 'git_commits_jira.csv'
    fieldnames = ['date', 'commit', 'jira_id']

    with open(csv_filename, 'w', newline='') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(processed_commits)

    return processed_commits

How It Works

Git Log: The script uses git log to fetch commit history.
JIRA ID Extraction: It uses regex to extract JIRA ticket IDs from commit messages.
CSV Export: The results are saved to a CSV file for later use.

Step 3: Tackling Slack Messages

Slack was trickier. Messages are context-heavy, and mapping them to specific tasks isn’t straightforward. I briefly considered using AI to parse the data but decided against it due to cost and complexity. Instead, I created a generic ticket to capture communication time and wrote a script to fetch Slack messages.

The Script

import os
from datetime import datetime
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
import pandas as pd

def get_all_conversations(client):
    """
    Get all conversations (channels, private channels, and DMs) the bot has access to.

    Args:
        client: Slack WebClient instance

    Returns:
        list: List of conversation objects
    """
    conversations = []

    try:
        # Get all conversations (channels, private channels, and DMs)
        result = client.conversations_list(
            types="public_channel,private_channel,im,mpim",
            limit=1000
        )
        conversations.extend(result["channels"])

        # Handle pagination
        while result.get("response_metadata", {}).get("next_cursor"):
            cursor = result["response_metadata"]["next_cursor"]
            result = client.conversations_list(
                types="public_channel,private_channel,im,mpim",
                cursor=cursor,
                limit=1000
            )
            conversations.extend(result["channels"])

    except SlackApiError as e:
        print(f"Error fetching conversations: {e.response['error']}")

    return conversations

def get_user_info(client, user_id):
    """
    Get user information from their ID.

    Args:
        client: Slack WebClient instance
        user_id: User's Slack ID

    Returns:
        dict: User information including real name and email
    """
    try:
        result = client.users_info(user=user_id)
        user = result["user"]
        return {
            "real_name": user.get("real_name", "Unknown"),
            "email": user.get("profile", {}).get("email", "Unknown"),
            "display_name": user.get("profile", {}).get("display_name", "Unknown")
        }
    except SlackApiError:
        return {"real_name": "Unknown", "email": "Unknown", "display_name": "Unknown"}

def get_messages_from_conversation(client, channel_id, channel_name, start_date, end_date):
    """
    Retrieve messages from a specific conversation within a date range.

    Args:
        client: Slack WebClient instance
        channel_id (str): The ID of the Slack channel
        channel_name (str): The name of the Slack channel
        start_date (str): Start date in format 'YYYY-MM-DD'
        end_date (str): End date in format 'YYYY-MM-DD'

    Returns:
        list: List of messages within the date range
    """
    messages = []
    start_timestamp = int(datetime.strptime(start_date, '%Y-%m-%d').timestamp())
    end_timestamp = int(datetime.strptime(end_date, '%Y-%m-%d').timestamp())

    try:
        # Initialize the cursor for pagination
        cursor = None
        while True:
            # Get messages using conversations.history
            result = client.conversations_history(
                channel=channel_id,
                oldest=start_timestamp,
                latest=end_timestamp,
                limit=100,
                cursor=cursor
            )

            # Process messages
            for msg in result['messages']:
                user_id = msg.get('user')
                user_info = get_user_info(client, user_id) if user_id else {
                    "real_name": "Unknown",
                    "email": "Unknown",
                    "display_name": "Unknown"
                }

                message_data = {
                    'channel_name': channel_name,
                    'channel_id': channel_id,
                    'text': msg.get('text', ''),
                    'user_id': user_id,
                    'user_name': user_info['real_name'],
                    'user_email': user_info['email'],
                    'timestamp': datetime.fromtimestamp(float(msg['ts'])).strftime('%Y-%m-%d %H:%M:%S'),
                    'thread_ts': msg.get('thread_ts', None)
                }
                messages.append(message_data)

            # Check if there are more messages to fetch
            cursor = result.get('response_metadata', {}).get('next_cursor')
            if not cursor:
                break

    except SlackApiError as e:
        print(f"Error fetching messages from {channel_name}: {e.response['error']}")

    return messages

def main():
    # Set your date range
    start_date = "2024-01-01"
    end_date = "2024-01-31"

    # Initialize the Slack client
    slack_token = os.environ.get('SLACK_BOT_TOKEN')
    if not slack_token:
        raise ValueError("Please set the SLACK_BOT_TOKEN environment variable")

    client = WebClient(token=slack_token)

    # Get all conversations
    print("Fetching all conversations...")
    conversations = get_all_conversations(client)

    # Fetch messages from all conversations
    all_messages = []
    total_conversations = len(conversations)

    print(f"Found {total_conversations} conversations. Fetching messages...")

    for idx, conv in enumerate(conversations, 1):
        channel_id = conv["id"]
        channel_name = conv.get("name", "DM")  # DMs don't have names

        print(f"Processing {idx}/{total_conversations}: {channel_name}")

        messages = get_messages_from_conversation(
            client,
            channel_id,
            channel_name,
            start_date,
            end_date
        )
        all_messages.extend(messages)

    # Convert to DataFrame for easier handling
    df = pd.DataFrame(all_messages)

    # Save to CSV
    output_file = f"slack_messages_{start_date}_to_{end_date}.csv"
    df.to_csv(output_file, index=False, encoding='utf-8')

    print(f"\nTotal messages retrieved: {len(all_messages)}")
    print(f"Messages saved to: {output_file}")

    # Print summary
    print("\nMessages per channel:")
    channel_summary = df.groupby('channel_name').size().sort_values(ascending=False)
    print(channel_summary)

if __name__ == "__main__":
    main()

How It Works

Conversation List: The script fetches all channels and DMs accessible to the bot.
Message Retrieval: It retrieves messages within a specified date range.
CSV Export: The messages are saved to a CSV file for further analysis.

Step 4: Capturing Outlook Meetings

Finally, I needed to account for meetings. Using the exchangelib Python library, I wrote a script to pull calendar events and export them to a CSV.

The Script

from exchangelib import Credentials, Account
import pandas as pd
from datetime import datetime, timedelta

def get_outlook_meetings():
    # Replace with your email and password
    email = os.environ.get("JIRA_EMAIL")
    password = os.environ.get('OUTLOOK_PASSWORD')

    # Connect to Exchange
    credentials = Credentials(email, password)
    account = Account(email, credentials=credentials, autodiscover=True)

    # Get calendar items for the last 30 days (adjust as needed)
    start_date = datetime.now() - timedelta(days=30)
    end_date = datetime.now() + timedelta(days=30)

    # Get calendar items
    calendar_items = account.calendar.view(
        start=start_date,
        end=end_date
    )

    # Prepare data structure
    meetings = []

    # Process each appointment
    for item in calendar_items:
        # Calculate duration in minutes
        duration = int((item.end - item.start).total_seconds() / 60)

        meetings.append({
            'date': item.start.strftime('%Y-%m-%d'),
            'time': item.start.strftime('%H:%M'),
            'duration': duration,
            'title': item.subject
        })

    # Create DataFrame and save to CSV
    df = pd.DataFrame(meetings)
    df.to_csv('outlook_meetings.csv', index=False)

    return df

# Run the function
if __name__ == "__main__":
    meetings_df = get_outlook_meetings()
    print(f"Exported {len(meetings_df)} meetings to outlook_meetings.csv")

How It Works

Authentication: The script uses your Outlook email and password to authenticate.
Calendar Query: It fetches calendar events within a specified date range.
CSV Export: The events are saved to a CSV file.

What’s Next?

At this point, I had four CSV files:

JIRA Tickets: All the tasks I worked on.
Git Commits: All the code I wrote.
Slack Messages: All the communication I participated in.
Outlook Meetings: All the meetings I attended.

In Part 2, I’ll show you how I stitched these datasets together to create a comprehensive timesheet. Spoiler alert: It involves more Python, some data wrangling, and a little bit of magic.

Stay tuned, and remember: Laziness is the mother of invention.

What’s your least favorite chore as a software engineer? Have you automated it yet? Share your stories in the comments!

DEV Community

The Lazy Engineer’s Guide to Automating Timesheets: Part 1

The Problem: Timesheets Are the Worst

The Tools of the Trade

Step 1: Pulling JIRA Tickets

The Script

How It Works

Step 2: Fetching Git Commits

The Script

How It Works

Step 3: Tackling Slack Messages

The Script

How It Works

Step 4: Capturing Outlook Meetings

The Script

How It Works

What’s Next?

Top comments (0)