This guide was written by one of our data engineers but we hope it provides an insight in plain English in how the estimated weight is calculated.

Simple

Consider a set of data containing measured weights and age for a herd cattle.

| age | weight |
| 100 | 100|
| 120 | 110|

We can plot this our onto a graph if weight versus age and fit a straight line to it of the form:

weight at age = (growth rate * age) + birth weight

We can determine the growth rate (or gradient) and birth weight (intercept) either from a graph or bit of computation. This is what people call linear regression. And if you like linear algebra you can solve this using one equation - nice!

This is fairly straight forward (no pun intended) but a simple line is probably not representative how cattle grows. So we can try and fit various other lines polynomial (e.g. with terms of age x age (squared) etc but we are guessing the form of a function.

When we talk about machine learning we are just talking about this fitting a line or curve to our data. We can use a variety of statistical techniques of varying complexity to fit the data to an arbitrary function but at the end of the day we are simply fitting a line to the data we have got.

A bit more complicated

There are other factors that we consider in our data model: sex and breed. So lets first add breed into our dataset and we now have a 3 dimensional set of data. Now instead of finding a line/curve to describe our data we are looking for a plane or slice. And we can just about picture this but when we add the sex we have created a 4 dimensional data set ... but the principles of fitting the data have not changed.

And data scientists will use a number of fitting techniques to fit the data. They may even average out a bunch of fitted models to make a better fit. There's a bit of skill in selecting the best approach.

Even more complicated

We used a neural network to model the data. This a bit more complicated to explain but ... it essentially combines all the inputs in all permutations and combinations to create a massive function and finds the combinations that give the best results. It's a black box so you can not gain any physical insight from the model; compare this to our straight line where we can get some insight via the growth rate and birth weight. But these neural networks tend to give much better predictions. But it is important to remember that we are just fitting a curve/plane to the data.

All about the data

This doesn't sound that hard - can't anyone do this? Proabably, but it's important that we have the data and the means of collecting it. The saying in data science is: rubbish in equals rubbish out. We could in principle have models per farm/region so cattle in scotland might grow differently to that found in the south west. We can also start to include other data that we collect to make things even better.

Need some help? Use the chat window to your right on web or the help button on mobile, call our customer support team on +44 (0) 3300 436327 or email support support@breedr.co