MTurk
Overview
Amazon Mechanical Turk (MTurk) is an online marketplace that can be used to collect data online quickly and cheaply by setting up HITs (Human Intelligence Tasks) that people can complete online. To manage and run MTurk experiments, we use CloudResearch, a third-party website that provides a user-friendly interface and useful features.
Links
- CloudResearch: https://account.cloudresearch.com/Account/Login
- Amazon Mechanical Turk: https://www.mturk.com
- Note: Our lab's ID is A2Z7DRFCANJ7P
- Other Links:
- TurkOpticon: https://turkopticon.net/
- TurkerView: https://turkerview.com/
- Resources and Information: https://journals.sagepub.com/doi/full/10.1177/0149206320969787
Preparing your Experiment for MTurk
- Experiment link: You will need to create a link for your web-based experiment (use Jarvis). MTurk workers (called Turkers) will use the link to complete your experiment.
- Save participant ID: You will need to record each participants "workerId" (This will let you match a person to their completion code and/or verify that someone who contacts you actually complete the HIT). Cloud Research (formerly Turk Prime) automatically inserts query data into the URL. The following code gets that data and adds it to the to-be-saved data.
// get workerId from URL
Var subNum = jsPsych.data.getURLVariable(“workerId”)
// add subject to data
jsPsych.data.addProperties({
subject: subNum
})
- Completion code: Your experiment should include a unique randomly generated code for each participant (turker). They will enter this code on MTurk to prove they actually completed the experiment. This code should be displayed at the end of the experiment, after the data has been saved (typically at the bottom of the debriefing form).
Create a New Study on CloudResearch
Log in to CloudResearch with the lab's MTurk email account.
Either copy and modify a prior study (completed studies will be in the "Archived" tab) by choosing "Copy Study" from the "Options" menu under the study you wish to copy or click "Create a Study" and select "MTurk Toolkit".
Then, fill out each page of the study details (see below).
Study Details
- Basic Info.
- Project Name: Give your project a name that will let you and other lab members know exactly what project this HIT belongs to (e.g., "ODay Dissertation E4" or "Delayed Self-Scoring E3). Note: this is not the name the particiants will see.
- Email Address: Enter an email to notify you when the experiment has ran the selected number of participants. Make sure that you check this often/regularly while you are collecting data so that you can quickly respond to participants if they contact you.
- Survey Hyperlink: This should be the "Experiment URL" generated by Jarvis or the "Survey Link" if you are using quotas in Jarvis.
- Auto-capture worker information: Choose "Yes" to have the WorkerId, hitId, and assignmentId added to the URL. Make sure that your experiment program is pulling these from the URL and saving them to the data. Note: you can opt. to use an anonymized workerId under "Additional Privacy Options".
- Describe HIT. Describe the HIT to workers, typically generic information describing a memory experiment or decision-making task (if test is incidental).
- Title examples: "Memory for key-term definitions" or "Memory Experient" or "Decision-Making Task"
- Description example: "In this experiment you will study a list of key-term definitions. Your memory for the definitions will be tested." or "You will see words, nonwords, sentences, pictures and/or faces and will be asked to respond to them."
- Setup HIT and Payment.
- Worker Payment Per Survey: The amount participants will be paid upon completion of the HIT.
- We used to pay subjects $0.10 per minute (e.g., $1 for 10 minutes, $2 for 20 minutes, etc.) which works out to $6 per hour.
- However, it is recommended that the hourly average pay be least minimum wage ($7.25 per hour) and preferably between the Federal & highest statewide (CA) minimum wages.
- Note: $10 per hour seems to be acceptable to most Turkers.
- Expected Time To Complete Assignment: This should be your best estimate of the maximum time to complete the experiment. Note: Make this a conservative estimate because workers will be pleasantly surpised if it takes less time than anticipated but very unhappy if it goes over.
- Time Allotted Per Assignment: How long they have to finish the experiment once they accept the HIT. This should be 3-4 times the expected length. You do not want people to start your experiment and then take a break and come back to it later (ask them to complete the HIT in some sitting at the start), but you also do not want them to be locked out of the HIT if it took them a bit longer than expected to finish. Giving them the right sized window allows them to start once they are ready (have elimianted disctacted and prepared to finish in one sitting) while still encoraging them to finish in a timely manner. Error on the shorter side (i.e., 90 minutes for a 30 minute experiment); worst case scenario is they email you saying they finished the HIT but did not have enough time to submit their completion code and you then set up a dummmy HIT to pay them after checking for them in your data.
- HIT Experires In: Choose how long you want to experiment to be posted for. I usually choose 1 week, but it never takes that long.
- Batching Options: Choose "HyperBatch" this will create multiple smaller (n = 9) HITs which will lower the MTurk fee. ("MicroBatch" is similar but will take longer because the HITs are posted squentially rather than simultaneously)
- Additional HIT Options:
- Select all EXCEPT "Display the Median Hourly Rate" (Because the median time is usually longer than the actual time spent completing the experiment this rate can be misleading) and "Do you want to automatically bonus all workers in your study?".
- Set "Location Requirements" to "must be from United States".
- Worker Payment Per Survey: The amount participants will be paid upon completion of the HIT.
- Demographics.
- Number of Survey Paricipants: Enter the number of subjects you want to participate.
- Choose Demographics: Leave blank; these cost extra.
- Enter the link to your experiment on this tab!
- Check the box for displaying the HIT (experiment) to only workers that qualify. The final two boxes are a matter of preference.
- Data Quality
- Select "CloudReseach Approved Participants" and ALL options under "Additional Data Quality Settings"
- How Workers are Approved.
- Select option to "Manually" approve HITs. It is good practice to manually approve workers as one person could send the completion code to their friends, who would enter the code but not complete the experiment (their ID would not be in your data).
- Select "Custom Completion Code"
- Autopay Workers In: Set time frame for when participants will be automatically paid if not manually approved/rejected. Have this be less than 30 days; I suggest 7 days because you will be approving/rejecting everyone within 24 hours and Turkers are reluctant to participate if they think they might have to wait very long to be paid.
- Worker Requirements.
- Choose to exclude workers who are not eligible for this experiment.
Exclude workers who complete these surveys: Use this to exclude pariticipants who already a completed similar experiment. For example, an prior experiment in the same line (i.e., exclude anyone who did E1 from doing E2) or an experiment that used the same stimuli. Select them from the drop-down list.
- Super Exclude: Select this option! It will exclude participants who started one of those HITs, even if they didn't finish it.
- Survery Group: Lets you group experiments together such that workers can only participate/start one from that group. This is a newer feature that we can use moving forward to group experiments by line such that one person can only participate once for that set of experiment.
- Worker Group: Is similar to "Survey Group" but you can use to either include of exclude across experiments. This could be used to make sure that a worker who has already learned Lithuanian-English translations in one experiment isn't allowed to participate in future experiments (even if from a different line/procedure) that include that type of stimuli.
- Naivete: Leave blank
- Alternatively, you can choose to include only certain workers. For example, only people who completed part one of a two-part experiment. Note: if you have to set up a dummy HIT to pay a participant who finished an experiment but wasn't paid by MTurk, restrict that HIT to only include that person's ID.
- Worker Qualification: Select "Yes"
- HIT Approval Rating = 95-100%
- Number of HITs Approved= 1,000 - 1 million
- Choose to exclude workers who are not eligible for this experiment.
Exclude workers who complete these surveys: Use this to exclude pariticipants who already a completed similar experiment. For example, an prior experiment in the same line (i.e., exclude anyone who did E1 from doing E2) or an experiment that used the same stimuli. Select them from the drop-down list.
- Save. Saves the changes you made to these tabs, but does not post the experiment to MTurk.
TODO: IS THE FOLLOWING STATEMENT CORRECT?
To launch an experiment go to dashboard and click the green launch button.
TODO: Create a new section: Setting up multi-session studies
Approving Workers
After workers finish your experiment, they will submit the HIT. Typically, we have some string of random characters that is presented at the end of an experiment (i.e., a "completion code"). Subjects are informed that they should type this string into a textbox in order to receive payment. Most people type the string in correctly, some people type in their worker id, and others just make stuff up.
Because this code is housed within our experiment and is external to CloudResearch, you will need to open your data file and the HIT approval menu together.
- Copy the workerId from the approval menu and use the search function in excel to see if the Id appears in your data set (indicating that the participate completed the experiment).
- If it does, do a quick skim of the data to make sure they tried. Did they answer some questions? Are there responses coherent? Does it seem like a person rather than a computer program generated these responses?
- If everything looks good, you can accept their HIT and they are paid.
- If there is a major problem with their HIT (e.g., they completed a 60 minute experiment in 5 minutes and everything is blank), then either reject their HIT or add them to an exclude list so that they will not be able to complete any of our future experiments.
- Because we restrict the participants to have a really high approval rating (95% or higher) people don’t like being rejected. It hurts their high status that they have worked very hard for. If you reject someone, EXPECT EMAILS.
- Make sure every rejection is justified. As stated before, some people type in random completion codes. They are doing this in hopes of getting accepted for HITs they didn’t complete. These should be rejected and probably won’t email. For example, if a Turker submits the HIT with a incorrect/made up completion code and their workerId does not show up in the data, then the person likely did not do the experiment and should not be paid.
Responding to Emails
While you are running an experiment on MTurk pay attention to the learnlabmturk@gmail.com and reply in a timely fashion to the workers emails. Workers usually email concerning technical difficulties. Some of the problems described are common and you can find the common lab responses under “canned responses” on the lower left corner of the email composition window. Remember to always be polite in your response and try to solve the workers issues. They tend to be friendly and appreciative and will share positive (and negative) interactions on TurkOpticon and TurkView.
- The most common problem is people forgetting or failing to submit the completion code before before the HIT expires. Best case scenario is that you see they completed the HIT but didn't put in the correct completion code and you can still pay them. Worst case scenario is that you have to create a new HIT, that only they can complete (you can specify this using their workerId which they will send you when they email). You create this HIT and a send them an email notification. They submit this “make-up” or "dummy" HIT where they don’t have to do anything and you pay them for that HIT.
- Another common problem is people haveing trouble with the experiment. This happens because people close the tab, click refresh, lose internet access, etc. I tend to make them redo the experiment in order to guarantee they did the experiment and to ensure their Id is in the list of people who completed the HIT. Make sure to exclude their data as they could have potentially completed the experiment twice (double the study time and exposure).
Best Practices
- Be sure to pilot your experiment yourself first, then with 390s, and then with some mTurkers. Test it out with 5-10 people. After, look at the data to make sure everything is recorded correctly, check the recorded duration from the data to make sure your estimate was correct, read any comments/email from participants. If applicable fixs issues/make changes (if you do, be sure to exclude those pilot participants from the final analyses) before posting for the target sample size.
- Keep the subject number low at first. It can be increased later but never decreased. You might be tempted to add extra participants to the total assuming that some will be excluded later, but it is better to add a new HIT once you know exactly how many more people you will need.
- Panel options/custom demographics cost more and we typically don’t need any of them to be turned on.
- Use a personal email address for the email. This emails you when the study is finished. Creating less clutter in the email used for the mTurk account.
- Make your expected time to complete longer than it takes especially if the experiment is not experimenter-paced. If people finish early they are happy. If they finish late they are very mad. It is definitely worth it to spend another $0.50-$1.00 to not deal with upset/angry MTurkers who feel like you misled them. Plus, we are reviewed online and want to cultivate a reputation of being positive and fair.
- I usually triple the expected time for the allotted time per assignment (the amount of time they have to turn in the HIT after they accept the HIT).
- I usually set the HIT expiration to be 1-3 days (not more than 7). But, you should approve them as quickly as possibly (ideally within 24 hours) to avoid complaints.
- Cloud Research will automatically add a query string for the variable “workerId”, so you will need to store that in the experiment.html file and add it to the to-be-saved data
- Double check that you pasted the correct experiment link. Then check it again just to be extra sure.
- Under worker requirements, you can exclude people who completed other experiments. This is important if you are using the same material across experiments. Add HITs to a "Worker Group" for the stimuli you are using so that the can be excluded from similar experiments.
- Under worker requirements, you can require people to have completed previous experiments to participate. In other words, you could make a HIT for the second session of an experiment and only people who completed part 1 can complete part 2.
- To improve the quality of the collected data we restrict the approval rating to be 95% or higher and that participants must have completed more than 1000 HITs
- If you want to make sure you get an equal amount of people in each condition using a single link and random assignment you can use the quota function on Jarvis. Make sure to update the "quota amount" regularly to make sure that participants are only assigned to conditions that still need more participants. If the "quota amount" is equal to or greater than the "current amount" for all conditions then assignment will be random otherwise, participants will be assigned only to conditions that haven't reached the quota.
Definitions
Definitions[edit] - AA: Auto Approval Time. A requester sets an auto-approval time of anywhere between 0 Seconds and 30 days for every HIT they create. After a HIT is submitted, it will automatically approve after this amount of time has gone past. (sometimes AA used to refer to AndAccept links instead of “PandA”.) - AMT: Amazon Mechanical Turk (mturk) A website for completing tasks for pay. A cloud-based labor platform designed to connect people who need data (Requesters) with people who are willing to provide it (workers). (Source = /r/mturk FAQ) - Batch: A type of HIT designed to allow a worker to complete multiple submission. You can complete as many HITs in the batch as it allows, unless otherwise advised by the requester’s instructions. - Blocks (Hard/Soft): A way for a requester to prevent a worker from accepting any of their HITs. A turker who receives blocks can be at risk of having their MTurk account suspended (banned). We usually do not block workers. When we want to prevent a worker or group of workers from doing our HITs we create a qualification and exclude those who have that qualification. - Bonus HITs: HITs specifically created for a worker that completed a HIT (or part of it) but, for some reason, was not able to submit the final code or appear in the list of workers who completed the HIT but contacted us via email. - HIT: Human Intelligence Task. A job posted to Amazon’s mTurk platform. Referred to as “human intelligence tasks” due to the fact that bots/programs are less capable of performing the work than human beings. - Masters: Master Qualification, Categorization Masters, Photo Moderation Masters “Masters are an elite group of Workers, who have demonstrated superior performance while completing thousands of HITs for a variety of Requesters across the Mechanical Turk Marketplace. Masters must maintain this high level of performance or risk losing this distinction. Mechanical Turk has built technology which analyzes Worker performance, identifies high performing Workers and monitors their performance over time.” (Source: Amazon Mechanical Turk FAQ) Using Masters usually imply higher Amazon fees. We do not usually use Masters. - Turker: Someone who works to complete tasks for pay on Amazon’s Mechanical Turk.