I am trying to model a system I have at work, and could really use some help/advice.
Let me just start by quoting Forrest Gump: "I am not a smart man."
Here goes.
We place balls into a system daily. Some days the number is zero, but could be as many as 20.
Those balls stay in the system for varying amounts of time, and then come out the other end as either Red or Green balls (and it is nearly 50/50).
The number that comes out the other end each day also varies, and does not necessarily match the input number (thus the volume is not constant).
We track each ball individually, based on a unique identifier, the date it entered, the date it exits, and its color upon exiting.
I have a histogram of how many balls exit the system by date ranges and color over the last few hundred samples.
I am trying to determine a reasonably effective way of estimating the number of balls that will come out Green at a chosen future date, based on the balls in the system today, and when they were placed in it.
I will repeat. I am not a smart man. My computer programming skills stopped when I was 13 with BASIC, although I am reasonably adept with Excel (without VBA experience). Thanks in advance for any help you may be able to provide.
Ha! I doubt it. I am trying to break it up into different systems and model it this way. The problem I am running into is that the standard deviation is huge, so even once I get the model figured out, the accuracy is going to be less than desired.
OK, I thought I had an answer but rereading this, I'm not sure I understand. You put balls into the box... And they come out red or green. So they have no color when they go in?
Ghost wrote:OK, I thought I had an answer but rereading this, I'm not sure I understand. You put balls into the box... And they come out red or green. So they have no color when they go in?
The balls are just a metaphor. Really, Indy is asking about pregnancy tests. He wants to predict how many side chicks are going to become baby-mamas on any given day.
So, yeah, the "Balls" (potential pregnancies) are in an indeterminate state on the date he "putts them in the system", so to speak, and it is an indeterminate time before the woman ever contacts him to let him know if she is prego. Green = "Yay, a baby!", Red = "Too bad, try again."
Wilt 'Indy' Chamberlain wrote:Some days the number is zero, but could be as many as 20.
The league needs heroes, villains... and clowns. -- Aztec Sunsfan
Ghost wrote:OK, I thought I had an answer but rereading this, I'm not sure I understand. You put balls into the box... And they come out red or green. So they have no color when they go in?
The balls are just a metaphor. Really, Indy is asking about pregnancy tests. He wants to predict how many side chicks are going to become baby-mamas on any given day.
So, yeah, the "Balls" (potential pregnancies) are in an indeterminate state on the date he "putts them in the system", so to speak, and it is an indeterminate time before the woman ever contacts him to let him know if she is prego. Green = "Yay, a baby!", Red = "Too bad, try again."
Wilt 'Indy' Chamberlain wrote:Some days the number is zero, but could be as many as 20.
I have been gone for weeks, and this is one of the first posts I read. Awesome.
Ghost wrote:OK, I thought I had an answer but rereading this, I'm not sure I understand. You put balls into the box... And they come out red or green. So they have no color when they go in?
Correct. They go in colorless, and come out red or green.
I am trying to separate it out into different systems and then clumsily add them together in the end. I might just have to create a big table array that has different values like a cheat sheet to look up if you know the number of balls in the system, their respective ages, and then do a different estimate for new balls as they enter the system up to some point in the future.
Yes and no. Typically, the older ones are more likely to come out Red, but you don't know that when they go into the system. And for the purposes of this "simple" model I was going to ignore that increased likelihood and tweak the output in the future if needed.
Then I think the model essentially comes down to 50/50. If there are no significant factors that determine when or what color a ball comes out, the only numbers that seem relevant are the number of balls at a given moment, the average frequency they come out, and 50%.
If age does skew the ratio, it still comes down to the overall average,which you said is roughly 50/50. So I don't see any significant factor besides the total number of balls in the system and the average duration they are there.
And, if you are trying to peg a single number for a given day, you will have problems. I'm going to assume that while you may add up to 20 in a day, and potentially none will come out, the average number in the system is pretty close to that at a given moment. That doesn't give you a lot to work with, and a small sample means high variance.
Yeah, as I mentioned, the standard deviation is very high. So it will always have a very high variance. I realize there is no good way to model this with high accuracy, but alas, I have been asked to predict the future (knowing it is a fuzzy picture).
And the volume in the system fluctuates quite a bit, but we don't have a frequent enough picture of the volume to truly get a day by day dataset.
The good news is that in 2 months we will be placing all of these into a common electronic system and will have real-time visibility to every ball at each stage of the process once it goes into the bucket. That should really help with our modeling after we get some data in there.