Women Writers Project

Iker Acosta Venegas, Yidi Wang, & Yu Xiao

Infographic

Motivation

The Women Writers Project is a program dedicated to collecting and promoting writings from women between the 16th and 19th centuries. The group aims to provide an inclusive and diverse perspective on the historical and literary landscape by including works from women of different races, cultures, and social backgrounds. The ultimate goal is to make these writings available to the public, especially for research purposes by college students. As part of this project, our team is going to be working with the Women Writers Project with the purpose of creating visualizations that represent various aspects of the collected writings. These visualizations showcase the genres, publication dates & locations, authors, genders, and citations of the works. By presenting these trends and patterns through visualizations, we hope to communicate important insights to the public and academic communities. Our work also aligns with the Women Writers Project's mission to promote and preserve historical texts and highlight the contributions of women writers throughout history. This project serves as an important reminder of the significance of women's voices in literature and history. Through our visualizations, we aim to inspire and facilitate further research and appreciation of these important works, while at the same time promoting these writings. Our team decided to focus on the Women Writers Project because we are going to be creating visualizations that represent the writings that were done between the 16th and 19th centuries. It was during this time that seen women in education-related activities were seen as something weird. The roles of each genre were clearly established, but this did not stop women to make these excellent writings. Our motivations come from observing this bravery from women in order to break these social norms.

Data

The data we are working with was collected by the Women Writers Project, and we have had the opportunity to meet with members of the organization several times to clarify the topics we will be focusing on, which include genres such as classical drama, novels, poetry, and sacred texts. Having direct communication with our partners has enabled us to ask data-related questions and gain a better understanding of the dataset. We are truly grateful to our partner because they have given us feedback on our creations as we have worked collaboratively with them in order to create a better final product. One advantage of these datasets is that they do not contain any missing values since the Women Writers Project strives to preserve the value of the writings. However, the datasets are in a JSON format based on genres, which none of us have used before. To visualize the data in a more familiar format, we used Tableau, which allowed us to better understand the data. The datasets contain information about works and their references within other works, including a unique identifier for the work, a list of referenced works with their unique identifiers, the number of times they are referenced within the work, and their full titles. Additionally, the datasets include the total number of gestures made in the work, the different types of gestures made (and their counts), a description of the work, and its publication date. Notably, all counts in the datasets are greater than or equal to one, indicating that our partner only collected data from the works being referenced. While the data was already clean and we did not have to worry about missing values, the JSON format was a challenge. But in order to solve this issue, we used Tableau to visualize the data in a spreadsheet format, which helped us better understand the data types within the datasets. The different types of data are ordinal for the publication date, quantitative for total gestures and counts, and categorical for all other objects. During a meeting with our partner, we faced challenges in identifying the appropriate final product and dataset for our project. Fortunately, our partner supplied us with a bibliography dataset, which enabled us to develop our second final product, a Sankey visualization showcasing the relationship between author gender and location. Our analysis specifically focused on exploring the link between the two variables.

Task Analysis

5x4 Table Example
Domain Task Analytic Task (Low-level, “Query”) Search Task (Mid-level) Analyze Task (High-level)
Throughout all the publications, find out the work that has more than 1 gesture. Retrieve value, Filter Locate Discover, Present
Find the different publications throughout time and count them. Filter Browse Discover, Present
Find the most popular genre of women’s work in the early modern era. Sort, Find Extremum Locate Discover

Data Analysis

A very interesting thing that we discovered from analyzing the first visualizations is that in the decade of 1800, there is a big drop in the count of books published, and after this, the count returned to its normal level in the decade of 1810. One of our theories to explain this significant drop in the count of publications is that during this decade the Napoleonic War began, and they were fought throughout Europe as well as in Egypt and in the Caribbean. This resulted in women occupying their time in helping these wars in whatever way they could, and they did not have any time to publish. I believe that this is a very interesting finding because it reveals how significant historic events affected the number of publications by women. Our second visualization yielded intriguing findings despite our focus on the Women Writers Project, which includes male authors. We aimed to identify the location of all writers in our dataset and discovered that most male writers hail from London, England, and have more diverse locations attributed to their gender. Meanwhile, most female writers are from London, with a few exceptions. Given our focus on writing from the 16th to 19th centuries, it's expected that many writing locations belong to major cities at the time. However, the most interesting aspect of this graph is how the majority of female writing occurred in England, with very few in other locations. In contrast, male writers are more dispersed across different locations, although a majority are still in England.

Design Process

To begin our brainstorming process, we generated approximately 10 sketches on paper. From these, we ultimately selected two visualizations: a comprehensive grouped bar chart and a Sankey diagram. Seeking advice from our professor, we integrated interactive features into a single plot. We then created digital sketches using UI tools to facilitate communication with our data provider, the Woman's Writer Project, halfway through the project. They provided guidance on improving the accessibility of color encoding. Additionally, we adjusted the measurements of publication date to capture overall trends within the dataset. Finally, we refined our visualizations using various Python packages to present these two visualizations in the most polished and effective way possible.

Data Visualization(s)

Count of Genre Through The Years

This visualization is a line chart that displays the number of publications over time, broken down by genre. This visualization showed trends in the data over time and, similar to our first visualization, indicated that the sacred genre was the most popular in the earlier years, while poetry became increasingly popular during the 17th century. It is very interesting to see how the most popular things at a time will affect the amount of writing, in this case, religion was very popular and for that reason, we can observe that sacred texts are very popular. We can also observe this with poetry, and this is due to the fact that the poetry boom was in these years, and we can observe the increase in poetry writings in these graphs. This is a very interesting finding that can only be found if you see the whole picture, rather than by year. For that reason, we decided to add this graph.

Count of Genre Through The Years

We choose grouped bar charts over stacked bar charts and line charts because we would like to display different interactive features all at once, including the pop-up table for description. The line chart is not clear and not intuitive for the table. It is not straightforward for a stacked bar chart to compare the count for works of a certain genre. The chart visualizes the count of documents belonging to different genres published over time. Each bar represents a genre and is color-coded based on the genre. The height of each bar represents the count of documents belonging to the genre, and the x-axis represents the different genres. The animation feature allows the user to see how the counts of documents change over time. The chart also includes hover data that displays additional information about each bar, such as the description of the documents belonging to the genre.

Work by Gender and Location

We planned to generate a network at first, but most of the work only got cited once, so we think the final network will not be very meaningful. The Sankey diagram below is our assignment for another class. Since we have worked in the past with this type of visualization, we are more confident about doing it again. Specifically this time, we are going to be using the bibliography data set, and we are going to be linking the gender of the author to the location where the book was published. Our project's Sankey diagram provides a comprehensive visualization of the relationships between gender and geographical location in the context of literary works by women writers and their referenced works. The data reveals a predominant concentration of these works in London, UK, with a total of 2.39k works. Within this location, male authors contributed 1.92k works, while female authors contributed 1.08k works, showcasing the gender dynamics in the literary landscape.

Conclusion

Our team utilized our data science skills to work on the Women Writers Project, which proved to be an enjoyable and meaningful experience. It was our first time using programming skills for a real-world project involving actual people, and we were motivated by the opportunity to make a positive impact in our community. To ensure we produced a quality final product, we maintained constant communication with our partners, ensuring that everyone was aligned on our approach.

After developing our drafts, we met with our partners to discuss their feedback and incorporate their suggestions, which was essential in determining the direction for our final product. In the end, we created three visualizations that effectively showcase data on the Women Writers Project. Our first visualization is a bar chart that shows the number of books by year, broken down by genre. This visualization revealed that there were no more than ten publications in each genre per year. It also highlighted that the sacred genre was one of the most popular genres, if not the most popular, due to the prevalence of Roman Catholicism at the time. However, we observed that poetry became the most popular genre by the end of our analysis, reflecting its popularity during the 17th century.

Our last visualization is a Sankey diagram that displays the relationship between author and location, highlighting that the majority of publications came from London, one of the largest cities at the time. This provided an interesting insight into how the city's importance was reflected in the amount of writing from that location. Finally, our third visualization is a line chart that displays the number of publications over time, broken down by genre. This visualization showed trends in the data over time and, similar to our first visualization, indicated that the sacred genre was the most popular in the earlier years, while poetry became increasingly popular during the 17th century. Overall, working on the Women Writers Project was a gratifying experience that allowed us to use our data science skills in a real-world context, resulting in a beneficial product for our community. Additionally, analyzing historic trends like Catholicism and poetry helped us understand how these factors can influence the amount of writing dedicated to specific topics.