Dates and years in the responses
The dates used in the responses can refer to an important information that is being neglected in the response fields. If certain dates are repeatedly used in the text, it should be time to open a new field for it in order to make monitoring process more robust and efficient.
The tool to look and extract dates in the text is regular expressions. The regular expressions9, shortly RegEx, are character patterns that are used to search and manipulate strings.
Although the most common pattern to write dates in Spanish follows the rule of day - month - year, not all dates are written in a consistent pattern.
A good rule of thumb to make this analysis easier, adapt writing dates in the ISO 8601 format10, which is an international standard for writing dates, in the future.
Therefore, we set up a complicated pattern to extract the strings, which covers the following cases below:
The explanation of sequences (the descriptions adapted from the POSIX standard11):
sequence | explanation |
---|---|
(sign) | a sign such as ~!#$%^&*()_-+= etc. |
d |
day of the month as decimal number (01–31) |
m |
month as decimal number (01–12) |
Y |
year |
B |
locale’s full month name (e.g., January) |
b |
locale’s abbreviated month name (e.g., Jan) |
You might try any of these sequences in R by using format
such as
format(as.Date("25-09-2019"), "%B")
.
index | format | variants | RegEx | RegEx description |
---|---|---|---|---|
1 | Y |
2019 |
(19 | 20) |
2 | B |
septiembre |
(?:enero | febrero |
3 | (d or m )(sign)(m or d )(sign)Y |
25/09/2019 , 25-09-2019 , 09-25-2019 etc. |
[- | /][- |
4 | B Y |
septiembre 2019 , sep 2019 |
(?:ene(?:ro)? | feb(?:rero)? |
5 | (B or b ) d (sign) Y |
julio 19 2019 , jul 9, 2019 , jul 9. 2019 |
(?:ene(?:ro)? | feb(?:rero)? |
6 | d de B Y |
28 de marzo 2019 |
+de+(?:ene(?:ro)? | feb(?:rero)? |
7 | d de B de Y |
16 de octubre de 2019 |
+de+(?:ene(?:ro)? | feb(?:rero)? |
8 | d de B del Y |
1 de octubre del 2019 |
+de+(?:ene(?:ro)? | feb(?:rero)? |
After the texts are extracted, they are displayed in a form of that they position of dates are shown in the sentences or paragraphs together with the adjacent words.
Spanish month names and abbreviated version in POSIX standard:
full | abbreviated |
---|---|
enero | ene |
febrero | feb |
marzo | mar |
abril | abr |
mayo | may |
junio | jun |
julio | jul |
agosto | ago |
septiembre | sep |
octubre | oct |
noviembre | nov |
diciembre | dic |
## # A tibble: 6 x 13
## folderName formName formNameRecode Month recordId question response
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Objectivo… Salud Salud 2019… jvnuye0… Cualita… "233 Eq…
## 2 Objectivo… Alojami… Alojamiento 2019… jwtsuzg… Qualita… En el a…
## 3 Objectivo… Alojami… Alojamiento 2019… jwv53s7… Qualita… "En el …
## 4 Objectivo… Alojami… Alojamiento 2019… s078965… Qualita… "Se apo…
## 5 Objectivo… Alojami… Alojamiento 2019… s186761… Qualita… "Se apo…
## 6 Objectivo… Necesid… Necesidades 2019… s196277… Cualita… provisi…
## # … with 6 more variables: description <chr>, partnerName <chr>,
## # subPartnerName <chr>, province <chr>, canton <chr>,
## # reportingUsers <list>
The found dates are as follows:
value |
---|
2019 |
2018 |
2023 |
1973 |
2086 |
1956 |
2016 |
2030 |
2020 |
1948 |
2014 |
mayo |
enero |
junio |
marzo |
julio |
agosto |
febrero |
abril |
septiembre |
octubre |
septiembre 2019 |
abril 2019 |
marzo 2019 |
28 de marzo 2019 |
28 de mayo de 2019 |
23 de julio de 2019 |
28 de febrero de 2019 |
14 de mayo de 2019 |
28 de junio de 2019 |
21 de mayo de 2019 |
8 de agosto de 2019 |
30 de enero de 2019 |
10 de abril de 2019 |
14 de agosto del 2019 |
The most common dates used in the responses:
value | n |
---|---|
mayo | 67 |
agosto | 52 |
junio | 39 |
marzo | 39 |
2019 | 36 |
julio | 30 |
abril | 26 |
febrero | 26 |
enero | 20 |
septiembre | 6 |
2020 | 4 |
abril 2019 | 4 |
2018 | 3 |
28 de febrero de 2019 | 2 |
10 de abril de 2019 | 1 |
14 de agosto del 2019 | 1 |
14 de mayo de 2019 | 1 |
1948 | 1 |
1956 | 1 |
1973 | 1 |
More precisely, the RegEx engine in R is called as Perl-Compatible Regular Expressions (PCRE) https://www.pcre.org/original/doc/html/ Accessed September 9, 2019.↩
ISO 8601 Date and Time Format https://www.iso.org/iso-8601-date-and-time-format.html Accessed September 9, 2019.↩
POSIX 1003.1 - man page for date (posix section 1p) https://www.unix.com/man-page/posix/1p/date/ Accessed September 9, 2019.↩