DEV Community

Samuel S
Samuel S

Posted on

Fake news detection with Keras and Python

In the digital age, where information spreads faster than ever, fake news has become a real threat. Whether it's misleading headlines or entirely fabricated stories, false information can influence public opinion, sway elections, or even endanger lives. To combat this, I built a sample of machine learning-based fake news detection model using TensorFlow and Natural Language Processing (NLP).

Overview

This project demonstrates a practical implementation of Fake News Detection using TensorFlow, leveraging Natural Language Processing (NLP) techniques and deep learning. The model is trained to classify whether a given news article is real or fake, which is a crucial task in today's information-driven society.

Technologies Used

  • TensorFlow 2.x – for building and training the neural network
  • Keras – high-level API for defining the model
  • Natural Language Processing (NLP) – tokenization, stopword removal, vectorization
  • Scikit-learn – for preprocessing, splitting datasets, and evaluating model accuracy
  • Pandas & NumPy – for data manipulation
  • Matplotlib – for result visualization

Features

  • Binary classification: Fake (1) vs. Real (0)
  • Dataset preprocessing: stopwords, stemming, tokenization
  • Deep learning model using LSTM layers
  • Training and validation performance tracking
  • Organized Jupyter notebook for easy understanding
  • Model evaluation with confusion matrix, accuracy, and loss visualization

Dataset

The model uses a labeled dataset of news articles consisting of:

  • Title and Text
  • Label: 1 for fake news, 0 for real news

You can easily swap in another dataset or extend it with more complex sources for multilingual or multi-category classification.

Model Performance

The model achieves good accuracy in distinguishing real and fake news articles after training with appropriate preprocessing and regularization. Detailed metrics and visual plots are available in the notebook for transparency and fine-tuning.

This is sample code for detect_fake_news.py

import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
import json
import os

MAX_LENGTH = 500
VOCAB_SIZE = 10000
USER_DATA_PATH = "user_training_data.csv"
FAKE_CSV_PATH = "data/Fake.csv"

def train_model(extra_texts=[], extra_labels=[]):
    # Load base datasets
    df_fake = pd.read_csv(FAKE_CSV_PATH)
    df_fake['label'] = 0

    df_true = pd.read_csv("data/True.csv")
    df_true['label'] = 1

    df = pd.concat([df_fake[['text', 'label']], df_true[['text', 'label']]])

    # Load and include user data if available
    if os.path.exists(USER_DATA_PATH):
        user_df = pd.read_csv(USER_DATA_PATH)
        df = pd.concat([df, user_df], ignore_index=True)

    # Append new user data if provided
    if extra_texts and extra_labels:
        new_data = pd.DataFrame({"text": extra_texts, "label": extra_labels})
        df = pd.concat([df, new_data], ignore_index=True)

        # Append new data to Fake.csv if it is labeled as fake news
        fake_data = new_data[new_data['label'] == 0]  # Only append if label is fake
        if not fake_data.empty:
            fake_data.to_csv(FAKE_CSV_PATH, mode='a', header=False, index=False)

        # Save new data to user_training_data.csv
        new_data.to_csv(USER_DATA_PATH, mode='a', header=not os.path.exists(USER_DATA_PATH), index=False)

    df.dropna(inplace=True)

    # Tokenize and train model
    tokenizer = Tokenizer(num_words=VOCAB_SIZE, oov_token="<OOV>")
    tokenizer.fit_on_texts(df['text'])
    sequences = tokenizer.texts_to_sequences(df['text'])
    padded = pad_sequences(sequences, maxlen=MAX_LENGTH, padding='post', truncating='post')
    X_train, X_test, y_train, y_test = train_test_split(padded, df['label'], test_size=0.2, random_state=42)

    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(VOCAB_SIZE, 16, input_length=MAX_LENGTH),
        tf.keras.layers.GlobalAveragePooling1D(),
        tf.keras.layers.Dense(24, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))

    model.save('fake_or_true_news_model.keras')
    with open("tokenizer.json", "w") as f:
        f.write(tokenizer.to_json())

    print("✅ Model trained and saved successfully.")
Enter fullscreen mode Exit fullscreen mode

Image description

Full Sample Code

Top comments (0)