TensorFlow: the ML elephant that's still standing

#english #machinelearning #deeplearning #arquitectura

This is post #4 in the Awesome Curated: The Tools series — where I do deep dives on the tools that pass the filter of our automated curation system. If you landed here directly, you might also want to check out how m2cgen lets you export ML models without shipping Python to production, because it connects pretty directly to what we're talking about today.

I was in an architecture meeting a few months ago. A team wanted to tear down their ML production stack and migrate everything to PyTorch because "TensorFlow is old and nobody uses it anymore." The main argument was that all the recent papers use PyTorch. I asked: where does the model run today? On a server on Google Cloud. Are there mobile endpoints? Yes — an iOS and Android app. How much traffic? Millions of requests per day.

I told them migrating wasn't necessarily a bad idea, but could they walk me through how much effort it would take to rewrite the deployment pipeline, the production serving layer, and the compiled TFLite model running on those mobile devices. Silence. The migration makes technical sense in an ideal world where you have six months and zero users waiting. In the real world, TensorFlow is still the answer when deployment matters more than the elegance of your training code.

And that's exactly what puts it here, on the list.

What it does

TensorFlow is Google's open-source machine learning framework. It started as a C++ library with Python bindings, and that's not a minor detail — the performance core is written in C++ and CUDA, and the Python API is essentially a very powerful wrapper over that. Today it has over 185k GitHub stars, which makes it one of the most starred repos on the entire platform.

The core proposition is: you build a computational graph that describes your model, and TF optimizes and executes it. In TF2 this got a lot friendlier with eager execution on by default (you can run operations line by line, just like in PyTorch), but the real power kicks in when you use @tf.function to compile functions into optimized graphs:

import tensorflow as tf

# Define the model — I'm using Keras which ships built into TF2
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.2),  # Regularization to prevent overfitting
    tf.keras.layers.Dense(10, activation='softmax')  # 10 output classes
])

# Compile with optimizer and loss function
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Training — X_train and y_train are your data
model.fit(X_train, y_train, epochs=10, validation_split=0.2)

But where TF really shines is the deployment ecosystem. TFLite converts trained models into optimized versions for mobile and edge devices — with quantization that shrinks model size from megabytes to kilobytes without losing too much accuracy. TensorFlow Serving is a production model server that scales horizontally, handles versioning, and delivers extremely low latencies. TensorFlow.js runs models in the browser. It's an entire ecosystem, not just a training library.

# Export to TFLite for mobile deployment
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Dynamic quantization — reduces size without retraining
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Convert — the output is a .tflite file that goes straight to iOS/Android
tflite_model = converter.convert()

# Save to disk
with open('optimized_model.tflite', 'wb') as f:
    f.write(tflite_model)

# This file is typically 3-10x smaller than the original model
# and runs without needing Python on the target device
print(f'TFLite model size: {len(tflite_model) / 1024:.1f} KB')

Why it made the list

The curation system picked it up in 6 independent awesome lists. That doesn't happen because of hype — it happens because 6 different communities, with different criteria, all reached the same conclusion: this is a tool you can't ignore. And the verdict from both the AI analysis and my own review was GEM, which in our system means exactly what it sounds like: something with real, lasting value.

What differentiates TF from PyTorch in this context isn't which one trains models better — in that game, honestly, PyTorch won the cultural war, especially in research. What sets TF apart is the deployment story. TFLite has no direct equivalent in the PyTorch ecosystem that's anywhere near as mature for production mobile use. TorchScript exists, but if you've ever tried to integrate a PyTorch model into a native iOS app you already know the pain compared to TFLite. TensorFlow Serving spent years handling brutal production workloads inside Google before it was ever open-sourced — that translates into a level of robustness you simply can't manufacture overnight.

The other factor is Google Cloud. If your infrastructure lives there, the native integration with Vertex AI, Cloud ML Engine, and the rest of the GCP ecosystem is a real multiplier. It's not ideological lock-in — it's architectural pragmatism.

When NOT to use it

If you're learning ML from scratch or doing research, PyTorch (github.com/pytorch/pytorch) will make your life much simpler. The API is more Pythonic, debugging is more intuitive because everything runs in eager mode by default, and the research community lives there — which means new papers come with PyTorch code, not TF. The historical baggage of TF1 vs TF2 still haunts Stack Overflow: you find contradictory answers because people mix versions without clarifying which is which. It's disorienting.

I wouldn't use it for small projects where deployment is just a normal Python web server either. In that case, you can train with whatever you want and export with m2cgen if the model is simple enough, or serve with FastAPI + pickle if you don't need scale. TF adds real complexity — use it when the problem justifies it.

And if your team has nobody with TF experience, the onboarding cost for a new project probably isn't worth it unless edge or mobile deployment is a concrete requirement from day zero.

TF is still standing, and there are reasons for that

What pushed me to confirm the human GEM verdict is this: TensorFlow isn't on 6 lists because it's trending. It's there because it solves production problems that others don't solve as well. It's the kind of tool you won't pick out of enthusiasm — you'll pick it out of necessity. And when you need it, you'll be glad it exists and that it's spent a decade getting battle-tested.

This is post #4 in Awesome Curated: The Tools. The series continues — every tool that shows up here has been through a community signal filter, AI analysis, and a human verdict before making the cut. If you're particularly interested in the ML angle, the post on m2cgen pairs really well with this one: it's exactly the other side of the coin, when the model is already trained and you need to get Python out of your production stack entirely.

This article was originally published on juanchi.dev