Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Limiting number of participants in a specific branch / condition of a survey

  • David_M
  • David_M's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
9 years 1 week ago #118010 by David_M
Hi everyone,

I did a cursory search on this but I couldn't find anything that would be directly useful, so I was hoping for your help. The problem is that I assume there are many solutions to this, so it is hard to construct search terms. I am reasonably good with computers, but I have not done any scripting in years. So I am rusty, but would probably understand any code provided. Thank you in advance!

Situation:
I have a survey, where I have different conditions inside it. My sampling method does not allow me to create different surveys for each condition, as that may mean that I will inadvertently get repeated meas ures in some indeterminate amount of cases.

So, I have a single survey, but people see one of six different branches at random. Because I want to use the general linear model methods in the analysis, I need to have the same amount (or as close as possible) of respondents in each group.

One way would be to implement survey (not respondent) consistent counters for each condition and remove the option to participate in a condition once the required number of participants have answered. Let's say a hundred people per condition.



Some folks from this forum have helped my colleague before with the randomisation routine, and this is tested and works. We, essentially have a randomisation matrix in a hidden question, looking like this

var conditionNumbers = [1, 2, 3, 4, 5, 6];

and then a lot shuffling and we initially wanted to remove branches by hand (by changing the matrix) but this is problematic, because we would not immediately know which conditions were oversubscribed. Ao we wanted to automate.

I figured that this is what we need to do:

(a) Have some sort of persistent counters for each of the conditions1-6.
(b) read the counter (should I put it in the survey table? I have access to the mySQL database. Can I do it through Lime or should I just create a survey and add a field in the survey table through mySQL frontend?)
(c) Implement the matrix in some way (probably simplest if I construct a string). Something along the lines of (pseudo language)
RequiredResponses = 100
ConditionNumbers = '[ '
if Counter1 < RequiredResponses) then ConditionNumbers = ConditionNumbers + '1,'; Counter1 = Counter1 + 1;
if Counter2 < RequiredResponses) then ConditionNumbers = ConditionNumbers + '2,'; Counter1 = Counter2 + 2;
...
If ConditionNumbers = '[' then Exit and tell people thanks for paritcipating, but the survey is now closed. /* This means all conditions have been met
If CondtitionNumbers <> '' /*this means at least one condition is still available, because I am adding to a string I need to replace ',' with ']'.


My Questions:
- Is my method too complicated, and could be done in a much simpler way?

- In case I am doing it right (tm) at least slightly, then my questions are:
- How do I implement persistent counters?
- how do I read them in javascript?
- How do I construct the ConditionNumbers array? In other words, is there a simpler way than converting it to string and back?
- How do I write the counters back in Java?

Thank you so much for your time and staying with me until the end. All your help would be greatly appreciated.

David
The topic has been locked.
More
9 years 1 week ago #118039 by jelo
I would keep it simple. Just generate a random number and choose the different branches with conditions on the random number.

Since you cannot be sure, that probands finish your survey, you shouldn't need to implement persistent counter. You might tweak the random generator to ensure a better distribution between 1-6. It perhaps depends on the PHP version used, but sometimes I got the impression that 1-3 is more often than 4-6. But even without tweaking you will get a good distribution.

A persistent counter can be a quota, when the exit URL is used to redirect to the survey again with a Variable in the URL, which is then used to redirect to a different branch in the survey.
IMHO that extra work with workarounds is not needed to ensure that you get enough respondents for every branch.

The meaning of the word "stable" for users
www.limesurvey.org/forum/development/117...ord-stable-for-users
The following user(s) said Thank You: David_M
The topic has been locked.
  • David_M
  • David_M's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
9 years 1 week ago #118088 by David_M
Thank you for this. I appreciate your quick reply. I know, I am probably slightly too pedantic about this.

As you probably know, one of the GLM constraints is equal group sample sizes, so that is what worries me. Since the 100 participants per group is not enough to break through the threshold where the unequal group sizes do not matter (according to Tabachnick and Fidell, group n must be > 300 in order to be able to ignore that constraint). So, if I get 95 and 110 in two groups the discrepancy is already 15% or so.

I will look into the quotas and see what I can find there. I just hoped there was an easy way to write to a database and read from it.


That would also be helpful in another experiment that I am constructing, where I ask a few general questions and depending on the answers, I would then send people on to a specific branch of the survey. My sorting criteria are several low probability events, so I need to constrain the number of participants who have not experienced any of the proposed events. Otherwise, I would get wildly disparate group sizes (the probability of experiencing any of the the events I am interested in, is about 2-3%). If I tell the participants what I am looking for, they will, all of a sudden all have experienced these events (as I pay a bonus to those who spend more time on a survey). It is unfeasible to pay all the participants the same, without any constraints in the group sample sizes, as there is every chance that I will get, let's say 5 respondents in every group I am interested in (30 in total) in a sample size of 1000. I cannot do anything with 5 people per group and yet I would still have to pay out hundreds of monetary units. So, in that case, I absolutely, desperately need to do triage in advance.

I've never worked with quotas, before, so this is all new to me. But. Is it possible to set a quota for a specific group of answers? If it is, then I see your logic. Let me just recap, to see if I understand correctly.


Let's say I have GD (group default), where I ask whether people have experienced one of the measured events. Let's say a checklist
C1 Event 1
C2 Event 2
C3 Event 3
C4 Event 4
C5 Event 5
C6 Not Experienced any of those

survey logic:
C1 leads to GE1 (group event 1)
C2 leads to GE2 (group event 2)
C3 leads to GE3 (group event 3)
C4 leads to GE4 (group event 4)
C5 leads to GE5 (group event 5)
C6 leads to GN (group No event)

Groups:
GE1-5 has a quota of 100. If group quota is exceeded, I redirect to end of survey
GN has a quota of 400 (so I can use pristine data in comparison to GE1-5, without having to do post-hoc tests countering data fatigue). If group quota is exceeded, I redirect to end of survey


Ok. I went back to the limeSurvey manual. It seems that it is quite possible to set a quota on specific responses. So this is not a question anymore, but I am leaving it there in order for someone to benefit. What seems doable is this:

Same structure as above.
Checkboxes become radio lists.
Set quotas:
C1 to C5 - 100
C6 - 400

Hide all groups coming after GD.
Survey logic (in GD) if C1 selected and not over quota un-hide GE1
etc.


Sorry for the long post. If anyone expresses an interest, I can post updates about how I got on with this.

Thanks,
David
The topic has been locked.
More
9 years 1 week ago #118092 by jelo

David_M wrote: Since the 100 participants per group is not enough to break through the threshold where the unequal group sizes do not matter (according to Tabachnick and Fidell, group n must be > 300 in order to be able to ignore that constraint). So, if I get 95 and 110 in two groups the discrepancy is already 15% or so.


What is maximum of potential participants you can reach? Are they forced to participate?
Is the invitation already a random controlled process?

Usually the biggest issue is non-response and not finishing the questionnaire.
You have six groups to fill with respondents. But n>300 in every group seems to unreachable.

What I don't get is, why you cannot allow 95,110,100,95,120,130 in your groups and only use n=95 from every group for your analysis. What would be the difference to 95,95,95,95,95,95? I wouldn't control for strict equal sizes during the fieldwork, because when doing a quality check you might have to exclude participants and your group sizes will change after finishing the field work. Happens all the time.

You might do an additional step and use a matching routine like propensity score to get the groups in balance before applying GLM analysis. But that will reduce the sampling size too.

The meaning of the word "stable" for users
www.limesurvey.org/forum/development/117...ord-stable-for-users
The following user(s) said Thank You: David_M
The topic has been locked.
  • David_M
  • David_M's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
9 years 1 week ago #118102 by David_M
Hi Jelo. Thanks for replying.

I would be using Amazon mTurk. That means that as long as you have the money you can get respondents. You can get about 500 responses in about an hour, if you pay a bit more than minimum effective rate (like $0.50).

My problem is not getting participants. It is not even them finishing - I usually incentivise completion and I get about 90% completion rates. My problem is that mTurkers talk amongst themselves and discuss optimum strategies to get maximum bonus. There are forums for that.

So, I do not want it to be clear while the experiment is running what it is about and how to increase your chances of maximizing utility. My argument to the ethics board is exactly that. I will disclose, but after data collection. If needs be, I'll send them an email with the debriefing notes. Once I am done. This is OT.

The maximum number of participants I can reach is about 10 million. They are not forced to participate, but do so willingly, because they get paid to do it. I don't really have the funds to pay 10 million participants, anyway. And am not looking for that many.

As I said before, in the second experiment I am looking into a low probability event. There is an opportunistic sample, of which about 2% are affected. I do not want to pay a hundred people each time in order to get two usable responses.

The problem with > 300 in group sample size is that some journals (like Psychological Science) will reject studies like that -> the sample size is too big and with such big samples all of a sudden everything has a significant effect on everything else and the observed power has long since become 1.0. Big sample sizes would be OK in EFA and CFA and if I wanted to look at interactions in AN(C)OVA, but I am interested in main effects, thus far.

If I get 95,110,100,95,120,130 and prune down, then I have an issue with some journals like, for example, Behavioural Research Methods. I guess I could argue that I wanted to stay within GLM constraints and randomly remove the appropriate number of responses and all of that would work, if I in parallel presented the results of an analysis where respondents were not removed and reached roughly the same results. That, however, would not work in my second experiment, where I would for example have 2500 vs 100 in some of the more lower-low-probability-event groups :). There is no way I can justify removing 2400 responses from a sample of 2500. And certainly I would be asked, why did I pay for all those responses and collected them, if I am chucking them now.

Of course I agree with you that there are always outliers and responses that need to be cut. In the first experiment we agreed (with my co-author) that we wouldn't attempt to control at all, so I am with you a 100% there. And thank you for your comments.

But in my second experiment, that is simply not an option for reasons mentioned above.

Thank you for taking the time. I appreciate it.
David
The topic has been locked.
  • holch
  • holch's Avatar
  • Offline
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
9 years 1 week ago #118104 by holch
Reading your first post I would say you can simply use quota.

I think quota can not be set on text/number questions, but in this case you could try to use a single answer question from 1-6 and mark the respective number, according to your random number. Once you get to "x" completed interviews for one of the numbers, they are screened out.

This might also the biggest problem of this approach, because those respondents would not finish. However, at 50 cent per respondent (what kind of respondents will you get for this amount is another aspect), it doesn't hurt much, if a few run into the screen out. You could redirect to a PHP page that sends you an alert via email, that one of the quotas are full.

You could then adapt the randomization script.

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

The topic has been locked.
More
9 years 1 week ago #118108 by jelo

holch wrote: Once you get to "x" completed interviews for one of the numbers, they are screened out.This might also the biggest problem of this approach, because those respondents would not finish.

My first idea was to not really screenout via Quota but to use the quota URL to redirect the user to the same survey again but with a parameter attached to the URL to jump to a different group inside the survey.
That way you get filled group one first, group two second etc. The problem with that approach is that you have to set a limit. But you can raise the limit during the field is live.

Well, when having 2500 vs. 100 you still can do your analysis. Why. You use the first 100 of the 2500 group. If you use quota you would have (when not raising the limit of e.g. 100) 100 vs. 2500.

I don't have the "pleasure" to submit papers to journals, but if journals see no problem when mTurk is the universe, I wonder why matched groups are (e.g. via propensity score) a problem.

Is there a paper about the mTurk community?
I directly think about pictures like this, when I hear mTurk:
news.softpedia.com/news/Shocking-Image-S...pulated-471961.shtml

Back to the quota. Use a loop to fill up all groups and then begin to raise the quota.

The meaning of the word "stable" for users
www.limesurvey.org/forum/development/117...ord-stable-for-users
The topic has been locked.

Lime-years ahead

Online-surveys for every purse and purpose