DEV Community: Yash

β or w? for weight...

Yash — Sun, 25 Jan 2026 19:00:07 +0000

Which one do you use β or w for weight {ŷ = x * (?) + (?)} ?

There are many ways to write the linear regression equation let's see two of them :

1. ŷ = x * w + b

where,
   ŷ -> predicted value
   x -> parameter / input
   w  -> weight factor
   b -> bias

2. ŷ = x * β + β₀

where,
   ŷ -> predicted value
   x -> parameter / input
   β  -> weight factor
   β ₀ -> bias

When-ever your teacher teaches you machine learning basics most likely he/she uses the first example naming convention, which is desirable. But when you stumbled upon a research paper, the story changed; you think that your knowledge is limiting your self :( and resist you from reading them .

Is there any specific any reason due to this happens.....?
The answer is Yes.

The naming convention changes as per the field changes. When you entered first time in the field of AI, you start from Machine Learning more specific linear regression. Here, input (parameters/attribute/column) is helpful to predict element ŷ (y hat/prediction), but not every input have same priority. Now weight plays a significant role to decide how impactful input is to predict ŷ. This is a normal explanation of linear regression but from Computer Science perspective, that's why w naming is quite meaningful to us. But in maths β (beta) is used for weight or Slope Coefficient in more mathematical way.
When you look closely to the research paper or any other book who referring past papers. books, researcher work, etc. they all are from maths branch not from CS background this is the reason the equation they mention in their research paper/book we often see β over w.

To reduce this confusion below table will help a lot :

Name	Perspective
Name	Computer Science	Mathematics
Weights	w (Weights)	β (Beta)
Bias	b (bias)	β₀ (intercept)
Learning Rate	α (Alpha)	η (Eta)
Loss/Error	L (Loss)	E (Error)
Cost Function	J (Cost Function)	C (Cost)
Input Data	x (Features)	X (Design Matix), I (Input)
Output / Label	y (Target)	d (Desired)
Prediction	ŷ (Y-hat)	hθ(x) (Hypothesis)
Prediction	a (Activation)	f(x)
Regularization	λ (Lambda)	α (Alpha)
	Be careful: Scikit-Learn uses α (alpha) for regularisation, but Deep Learning uses α for learning rate!