Version v0.0 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.
Add metadata to datasets of individual human records
Categories:
This below section renders a vignette article from the youthvars library. You can use the following links to:
- view the vignette on the library website (adds useful hyperlinks to code blocks)
- view the source file from that article, and;
- edit its contents (requires a GitHub account).
Note: This vignette is illustrated with fake data. The dataset explored in this example should not be used to inform decision-making.
Youthvars provides a two classes - YouthvarsProfile
and YouthvarsSeries
that are useful for describing features of datasets. The tools in youthvars
build on the metadata included in a Ready4useDyad.
Ingest data
To start we ingest X
, a Ready4useDyad
(dataset and data dictionary pair) that we can download from a remote repository.
X <- ready4use::Ready4useRepos(dv_nm_1L_chr = "fakes",
dv_ds_nm_1L_chr = "https://doi.org/10.7910/DVN/W95KED",
dv_server_1L_chr = "dataverse.harvard.edu") %>%
ingest(fls_to_ingest_chr = "ymh_clinical_dyad_r4",
metadata_1L_lgl = F)
Add metadata
We could add metadata about X
, such as the unique identifier variable name, by transforming it to a YouthvarsProfile
instance.
## Not run
# X <- YouthvarsProfile(a_Ready4useDyad = X,
# id_var_nm_1L_chr = "fkClientID")
However, in this case the data we ingested includes a longitudinal dataset. It is therefore preferable to transform X
into a YouthvarsSeries
instance. YouthvarsSeries
objects contain all of the fields of YouthvarsProfile
objects, but also include additional fields that are specific for longitudinal datasets (e.g. timepoint_var_nm_1L_chr
and timepoint_vals_chr
that respectively specify the data-collection timepoint variable name and values and participation_var_1L_chr
that specifies the desired name of a yet to be created variable that will summarise the data-collection timepoints for which each unit record supplied data).
X <- YouthvarsSeries(a_Ready4useDyad = X,
id_var_nm_1L_chr = "fkClientID",
participation_var_1L_chr = "participation",
timepoint_vals_chr = c("Baseline","Follow-up"),
timepoint_var_nm_1L_chr = "round")
YouthvarsSeries methods
Currently, only methods for YouthvarsSeries
(and not yet YouthvarsProfile
) have been included with the youthvars
package. These methods are summarised in the following sections.
Validate data
We use the ratify
method to ensure that X
has been appropriately configured for methods examining datasets reporting measures at two timepoints.
X <- ratify(X,
type_1L_chr = "two_timepoints")
Inspect data
We can now specify the variables that we would like to prepare descriptive statistics for using the renewSlot
and renew
methods. The variables to be profiled are specified in arguments beginning with “compare_”. Use compare_ptcpn_chr
to compare variables based on whether cases reported data at one or both timepoints and compare_by_time_chr
to compare the summary statistics of variables by timepoints, e.g at baseline and follow-up. If you wish these comparisons to report p values, then use the compare_ptcpn_with_test_chr
and compare_by_time_with_test_chr
arguments.
X <- renewSlot(X,
"descriptives_ls",
compare_by_time_chr = c("d_age","d_sexual_ori_s","d_studying_working"),
compare_by_time_with_test_chr = c("k6_total", "phq9_total", "bads_total"),
compare_ptcpn_with_test_chr = c("k6_total", "phq9_total", "bads_total")) %>%
renew(type_1L_chr = "characterize")
The tables generated in the preceding step can be inspected using the exhibit
method.
The depict
method can create plots, comparing numeric variables by timepoint.
depict(X,
type_1L_chr = "by_time",
var_nms_chr = c("c_sofas"),
label_fill_1L_chr = "Time",#
labels_chr = c("SOFAS"),#
y_label_1L_chr = "")
#> Warning: `stat(width * density)` was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(width * density)` instead.
#> ℹ The deprecated feature was likely used in the youthvars package.
#> Please report the issue to the authors.
Share data
If and only if the dataset you are working with is appropriate for public dissemination (e.g. is synthetic data), you can use the following workflow for sharing it. We can share the dataset we created for this example using the share
method, specifying the repository to which we wish to publish the dataset (and for which we have write permissions) in a (Ready4useRepos object).
Y <- Ready4useRepos(gh_repo_1L_chr = "ready4-dev/youthvars", # Replace with your repository
gh_tag_1L_chr = "Documentation_0.0"), # (need write permissions).
Y <- share(Y,
obj_to_share_xx = X,
fl_nm_1L_chr = "ymh_YouthvarsSeries")
X
is now available for download as the file ymh_YouthvarsSeries.RDS
from the “Documentation_0.0” release of the youthvars package.