| Title: | Prepare 'WEXTOR' Data |
|---|---|
| Description: | Facilitate data preparation for data collected on 'WEXTOR' <https://wextor.eu>, created by Reips and Neuhaus (2002) <doi:10.3758/bf03195449>. Perform plausibility and other checks and make use of cool color palettes and themes for data visualization. |
| Authors: | Annika Tave Overlander [aut, cre] (ORCID: <https://orcid.org/0009-0006-8373-4086>), Ulf-Dietrich Reips [ths, cph] (ORCID: <https://orcid.org/0000-0002-1566-4745>) |
| Maintainer: | Annika Tave Overlander <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.1 |
| Built: | 2026-06-01 16:54:34 UTC |
| Source: | https://github.com/iscience-kn/rextor |
This function lets you ask dino Rex for help. You can try to ask for specific functions and topics or just use the function as-is for some motivational words of encouragement. :)
ask_rex(topic = "?")ask_rex(topic = "?")
topic |
What you want to know about; either leave it blank for motivation or specify 'ttest', 'mean'... Input must be in quotes! |
Returns advice and explanations on R; sometimes Rex asks follow-up questions
# Get motivation ask_rex() # Get specific information ask_rex("mean") ask_rex(topic = "ttest")# Get motivation ask_rex() # Get specific information ask_rex("mean") ask_rex(topic = "ttest")
Raw data from the BiFiX study (submitted for publication in BRM)
A data frame with 760 rows and 81 columns:
validation variable
ID
Experimental condition
Big Five items (A - Agreeableness, C - Conscientiousness, E - Extraversion, N - Neuroticism, O - Openness)
...
https://github.com/iscience-kn/BiFiX
Returns a logical vector indicating whether a given check column equals "ok".
If the column does not exist in the data, all cases are treated as passing.
check_ok(data, col)check_ok(data, col)
data |
A data frame containing plausibility check variables. |
col |
A character string specifying the column name to check for "ok" entries (case-sensitive). |
Used inside the plausicheck() function.
A logical vector of length nrow(data).
example_df <- data.frame(check_ip = c("ok", "nope", "ok", "OK")) check_ok(example_df, "check_ip")example_df <- data.frame(check_ip = c("ok", "nope", "ok", "OK")) check_ok(example_df, "check_ip")
The mode is the number that appears the most often in a vector or variable. This measure is especially important for nominal variables since the mean or median cannot meaningfully be reported.
getmode(v)getmode(v)
v |
a vector of variable in dataframe |
Returns the mode of v, i.e., the value occurring most often
vec <- c("A", "A", "B") getmode(vec)vec <- c("A", "A", "B") getmode(vec)
Get percentage of a value
getperc(data, var, val)getperc(data, var, val)
data |
A dataframe |
var |
A variable in the dataframe, written as a character in quotes, e.g. "var" |
val |
A value of variable var of which to get the percentage, also in quotes |
Returns the percentage of value val in relation to all values of var
getperc(data.frame(a = c("duck", "duck", "goose")), "a", "duck") # Also works when val is indexed (alphabetically) getperc(data.frame(a = c("duck", "duck", "goose")), "a", 2)getperc(data.frame(a = c("duck", "duck", "goose")), "a", "duck") # Also works when val is indexed (alphabetically) getperc(data.frame(a = c("duck", "duck", "goose")), "a", 2)
One of the best practices in data collection is including and informed consent question. With this function you can automatically filter out rows where participants did not explicitly consent to their data being used.
informed_consent(dataframe, varname = "informedconsent")informed_consent(dataframe, varname = "informedconsent")
dataframe |
A dataframe of data collected with 'WEXTOR' |
varname |
character. Variable name of informed consent variable. Defaults to "informedconsent". |
Returns data with cases filtered out that did not check the informed consent box
data <- data.frame(informedconsent = c("checked", "not checked", "checked")) informed_only <- informed_consent(data)data <- data.frame(informedconsent = c("checked", "not checked", "checked")) informed_only <- informed_consent(data)
ip_check() identifies potential duplicate participations in a dataset
by checking for repeated IP addresses. Cases with duplicated IPs are
flagged as "possible duplicate", while unique IPs are labeled "ok".
Keep in mind that in some cases, duplicate IP addresses should not lead to exclusion,
for example when different participants (such as siblings or flatmates) plausibly
use the same device.
ip_check(dataframe)ip_check(dataframe)
dataframe |
A data frame containing the study data. Must include a
variable named |
If WEXTOR-style prefixes (i.e. .wx.) are detected in the variable names,
they are removed prior to performing the check.
The function uses duplicated() to flag repeated IP addresses.
Only subsequent occurrences are marked as duplicates; the first instance
of each IP is treated as valid.
A data frame with an additional column check_ip, indicating
whether each case has a unique IP ("ok") or is a potential duplicate
("possible duplicate").
data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # The example data does not contain real IPs (data protection), so we will use simulate ones data$ip <- sample(1:1000, nrow(data), replace = TRUE) new_data <- ip_check(data)data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # The example data does not contain real IPs (data protection), so we will use simulate ones data$ip <- sample(1:1000, nrow(data), replace = TRUE) new_data <- ip_check(data)
This function strips the old prefix from the variable names of the dataframe. It then adds the new prefix where the old one was and changes the dataframe's names to the new ones.
namepref(dataframe, pref_old, pref_new)namepref(dataframe, pref_old, pref_new)
dataframe |
A dataframe |
pref_old |
Old prefix character that some or all variables in the dataframe have |
pref_new |
New prefix character to replace the old one |
Returns dataframe with those variables renamed that start with the old prefix.
bla <- tibble::tibble(x_ar = 1:5, y_ar = 6:10) blo <- namepref(bla, "x_", "z_") names(bla) names(blo)bla <- tibble::tibble(x_ar = 1:5, y_ar = 6:10) blo <- namepref(bla, "x_", "z_") names(bla) names(blo)
This function adds the new prefix and changes the dataframe's names to the new ones.
namepref0(dataframe, pref_new)namepref0(dataframe, pref_new)
dataframe |
A dataframe |
pref_new |
New prefix character to add to all variable namess in the dataframe have |
Returns dataframe with all variables renamed to start with the new prefix.
bla <- tibble::tibble(x_ar = 1:5, y_ar = 6:10) blo <- namepref0(bla, "var") names(bla) names(blo)bla <- tibble::tibble(x_ar = 1:5, y_ar = 6:10) blo <- namepref0(bla, "var") names(bla) names(blo)
This function makes the example data from the BiFiX study (submitted for publication in BRM) easy to access.
path_to_file(path = NULL)path_to_file(path = NULL)
path |
Name of file in quotes with extension;
|
Gives the path to the example file included in the package. Meant for use in read_WEXTOR().
This function is adapted from palmerpenguins (which is adapted from readxl::readxl_example()).
path_to_file() path_to_file("BiFiX_data_raw.csv") head(read.csv(path_to_file("BiFiX_data_raw.csv")))path_to_file() path_to_file("BiFiX_data_raw.csv") head(read.csv(path_to_file("BiFiX_data_raw.csv")))
plausicheck() performs basic plausibility checks on a study dataset to
identify potentially invalid or suspicious participation. The function can
check whether participants visited a minimum number of pages, whether the
recorded session length appears plausible, and whether IP addresses indicate
duplicate participation.
plausicheck(dataframe, min_pages, check_sess_length = TRUE, check_ip = TRUE)plausicheck(dataframe, min_pages, check_sess_length = TRUE, check_ip = TRUE)
dataframe |
A data frame containing the study data (needs to contain variables |
min_pages |
Numeric. The minimum number of pages a participant must have visited in the study for their participation to be considered plausible. |
check_sess_length |
Logical. Should the session length plausibility check
be performed? Defaults to |
check_ip |
Logical. Should the IP address plausibility check be performed?
Defaults to |
If WEXTOR prefixes are detected in the variable names, they are removed before the plausibility checks are applied.
A data frame with additional plausibility check variables. The final
variable check_plausibility indicates whether all selected checks were passed
("all ok") or whether the case should be excluded ("exclude"). Keep in mind
that researchers are advised to make sure that the "exclude"-cases were correctly
identified and are indeed of poorer data quality to avoi unnecessary data loss.
data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # The example data does not contain real IPs (data protection), so we will use simulate ones data$ip <- sample(1:1000, nrow(data), replace = TRUE) plausi_data <- plausicheck(dataframe = data, min_pages = 6, check_sess_length = TRUE, check_ip = TRUE)data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # The example data does not contain real IPs (data protection), so we will use simulate ones data$ip <- sample(1:1000, nrow(data), replace = TRUE) plausi_data <- plausicheck(dataframe = data, min_pages = 6, check_sess_length = TRUE, check_ip = TRUE)
With this function, you can easily read in your 'WEXTOR' generated data. By default, it will be exported as a "CSV" file, which stands for "comma-separated values". R has many available options of reading in this type of data, which can make it hard to navigate which one to use - especially for beginners.
read_WEXTOR(filepath, keep_validation = TRUE)read_WEXTOR(filepath, keep_validation = TRUE)
filepath |
Location of the WEXTOR CSV file on your computer. |
keep_validation |
Should the validation variable from WEXTOR be kept? TRUE by default. |
In the read_WEXTOR() function, you just need to provide the filepath, i.e. the location of the CSV data file
that you downloaded from 'WEXTOR'.
If you want, you can already decided whether to keep the first column containing a so-called validation variable, but you don't have to.
With no other input, rextor will keep the variable for you.
You can explicitly set keep_validation to FALSE if you like your data neat and do not need this extra measure.
The usage is explained in the examples down below using the open source BiFiX data with psychological Big Five personality traits.
This function prepares the 'WEXTOR' data so that it is readable by both R and you as a human. It will give you your original data and also make the start and end time of each participation easier to read and work with later (by default, 'WEXTOR' will return these values as date and time separately, cluttering your dataset).
The WEXTOR data as an R data object.
data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # If you don't need the validation variable, try data_noval<-read_WEXTOR(path_to_file("BiFiX_data_raw.csv"), keep_validation = FALSE)data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # If you don't need the validation variable, try data_noval<-read_WEXTOR(path_to_file("BiFiX_data_raw.csv"), keep_validation = FALSE)
This function strips the old prefix from the server variable names of the dataframe. It identifies such variables that start with ".wx.#." where # stands for any number, or simply ".wx." and then removes these prefixes for more easily legible and usable variable names in your data!
removepref(dataframe)removepref(dataframe)
dataframe |
A dataframe, usefully containing variables that have the server prefix ".wx.#." |
Returns dataframe with those variables renamed that start with the old prefix.
bli <- tibble::tibble(.wx.1.ar = 1:5, y_ar = 6:10, .wx.z = 11:15) blu <- removepref(bli) names(bli) names(blu)bli <- tibble::tibble(.wx.1.ar = 1:5, y_ar = 6:10, .wx.z = 11:15) blu <- removepref(bli) names(bli) names(blu)
This scale is very versatile and powerful. It offers several color palettes to choose from and automatically recognizes whether to use
a discrete or continuous scale depending on the data type of the mapping object.
It can also be used for both color and fill aesthetics by defining aesthetic accordingly directly inside the function!
scale_rextor(pal = "cute", direction = 1, aesthetic = "color", ...)scale_rextor(pal = "cute", direction = 1, aesthetic = "color", ...)
pal |
Color palette. Use either numbers from 1 to 9 or one of the names:
|
direction |
Direction of the color palette. Keep blank or use 1 for the default direction or use -1 to reverse the colors. |
aesthetic |
Which aesthetic to use the scale on; either |
... |
Other common scale parameters. |
ggplot plot object
library(ggplot2) ggplot(iris, aes(Species, Petal.Width, color = Petal.Width)) + geom_jitter() + theme_wob() + scale_rextor() data <- data.frame(Time = rep(c("Time 1", "Time 2", "Time 3", "Time 4", "Time 5", "Time 6", "Time 7", "Time 8"), each = 2), cont = 1:16, Value = c(8, 5, 10, 3, 8, 5, 10, 3, 8, 5, 10, 3, 8, 5, 10, 3)) ggplot(data, aes(x = Time, y = Value, color = Time, fill = Time)) + geom_boxplot(alpha = 0.95) + theme_wob() + scale_rextor(pal = "neon") + scale_rextor(pal = "neon", aesthetic = "fill") ggplot(data, aes(x = cont, y = Value, color = cont, fill = cont)) + geom_jitter(alpha = 0.8, size = 8) + theme_minimal() + scale_rextor(pal = "wextor") + scale_rextor(pal = "wextor", aesthetic = "fill")library(ggplot2) ggplot(iris, aes(Species, Petal.Width, color = Petal.Width)) + geom_jitter() + theme_wob() + scale_rextor() data <- data.frame(Time = rep(c("Time 1", "Time 2", "Time 3", "Time 4", "Time 5", "Time 6", "Time 7", "Time 8"), each = 2), cont = 1:16, Value = c(8, 5, 10, 3, 8, 5, 10, 3, 8, 5, 10, 3, 8, 5, 10, 3)) ggplot(data, aes(x = Time, y = Value, color = Time, fill = Time)) + geom_boxplot(alpha = 0.95) + theme_wob() + scale_rextor(pal = "neon") + scale_rextor(pal = "neon", aesthetic = "fill") ggplot(data, aes(x = cont, y = Value, color = cont, fill = cont)) + geom_jitter(alpha = 0.8, size = 8) + theme_minimal() + scale_rextor(pal = "wextor") + scale_rextor(pal = "wextor", aesthetic = "fill")
Filter your data frame by your item for the "seriousness check". If the data comes from 'WEXTOR' with no previous changes to the automatically created variable, you will be able to accept all defaults in this function and simply input the data frame.
serious_check(dataframe, varname = "seriousness", keep = "participate")serious_check(dataframe, varname = "seriousness", keep = "participate")
dataframe |
Data which contains a variable for the seriousness check. |
varname |
Variable for the seriousness check. Defaults to |
keep |
Values from the variable to keep. Defaults to |
Filtered data
data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) serious_only <- serious_check(data)data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) serious_only <- serious_check(data)
This function takes a dataframe as input as well as the minimum number of webpages in your web experiment or study, that a participant should have seen in order to be considered a complete (or plausible) participation. It flags the most extreme 10% of complete participations in the data as very slow or very fast.
sess_length_check(dataframe, min_pages = 6)sess_length_check(dataframe, min_pages = 6)
dataframe |
A dataframe containing WEXTOR data (needs to contain variables |
min_pages |
Numeric. The minimum number of pages a participant must have
visited in the study for their participation to be considered plausible. Defaults to |
A dataframe with added variables sess_length_clean for only completed session lengths, otherwise NA
and check_sess_length for a flag of the most extreme 10% of time needed to finish participation
data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) new_data <- sess_length_check(data)data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) new_data <- sess_length_check(data)
This theme option is built to be used with a ggplot2 plot.
It offers several options for a dark theme, i.e. dark background with light writing.
theme_wob(max_font_size = 14, contrast = "high")theme_wob(max_font_size = 14, contrast = "high")
max_font_size |
Maximum font size, used for the plot title (other font sizes are automatically adjusted accordingly). |
contrast |
Character to indicate the choice of color palette depending on the desired contrast.
Defaults to |
Plot theme.
# Load the data data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # Create any plot library(ggplot2) ggplot(data, aes(age, color = gender, fill = gender)) + geom_density(alpha = .5) + theme_wob(contrast = "rex")# Load the data data <- read_WEXTOR(path_to_file("BiFiX_data_raw.csv")) # Create any plot library(ggplot2) ggplot(data, aes(age, color = gender, fill = gender)) + geom_density(alpha = .5) + theme_wob(contrast = "rex")