At the end, you will have 5 results (some saved arrays or tables) on test set, and now what you can do - simple calculate average on this results - this is will be your "accuracy". Next - train: 1,3,4,5 and test: 2 and so on (I'm sure you understand what I mean). First you train your NN on 1,2,3,4 batch and test 5. So, your experiments will be with 80% train and 20% test at each run. If you worried about test results, you can split your dataset into 5 batches and train/test on this batches. It will be cover (or must cover) all possible data that you have and could give you clear view how good your NN is. In your example, you have 10million records - this is very huge dataset and for me - split into, for example, 80% training, 20% test - will be enough. I mean that - for personal experiments or some work stuff (not paper-like work I mean) - you can skip split into validation - do only test/train split, for me - split into validation is redundant in this cases. I think that it is mostly situational for the number of the data you have and for situation WHY you train your NN. So the test set will be representative of your total data set (however large that might be) to within a few percentage points.įor further reading: the law of large numbers. But still, it could be a 70/30 distribution.įor a test set of 1000, you're guaranteed a roughly 50/50 distribution. If you evaluate on 10 samples, it's getting pretty darn unlikely that all of them are outliers, or all of them are normal samples. If you evaluate on 2 samples, odds are that both of them are normal samples, or both of them are outliers, so your metrics are off. For accurate performance metrics, you want to evaluate your model on normal samples, and outliers in a roughly 50/50 distribution. Let's assume half of all samples are normal, and half are outliers (this is a pretty bad dataset). How representative a sample is of the whole is mainly a function of the absolute number of samples. I get the impression that you might be worried that a 1% sample of the data isn't representative of the whole, no matter the absolute number of samples.