Implementación de un sistema genérico con Spark para acelerar y optimizar la migración de datos de los sistemas trasnacionales hacia un repositorio Data Lake en una entidad bancaria

Cruz Ruiz, Alex Julio

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.14076/26622

Title:	Implementación de un sistema genérico con Spark para acelerar y optimizar la migración de datos de los sistemas trasnacionales hacia un repositorio Data Lake en una entidad bancaria
Authors:	Cruz Ruiz, Alex Julio
Advisors:	Sotelo Villena, Juan Carlos
Keywords:	Sistema genérico con Spark;Repositorio data lake
Issue Date:	2023
Publisher:	Universidad Nacional de Ingeniería
Abstract:	Muchas organizaciones, en especial del sector financiero, que manejan grandes volúmenes de datos se encuentran en un proceso de migración de sus datos hacia un repositorio data lake. Entre los principales inconvenientes que se han encontrado es la falta de conocimiento técnico para realizar las migraciones. Así mismo, los equipos encargados que tienen el conocimiento no se dan abasto para una gran cantidad de fuentes a migrar, impactando a los equipos de negocio que requieren dicha información. Por último, para las fuentes ya migradas con otras tecnologías se presentan casos de cancelaciones o tiempos excesivos en la ejecución de los procesos. El presente trabajo plantea como objetivo implementar un sistema genérico, teniendo como componente principal al motor de procesamiento de datos spark, para acelerar y optimizar la migración de datos de los sistemas transaccionales hacia un repositorio data lake en una entidad bancaria. El uso de dicho sistema es sencillo, mediante la configuración de parámetros y conocimiento básico de sql, es posible la migración de datos, que incluso usuarios de negocio podrían utilizarlo. Para lograr la implementación se obtuvieron los requerimientos del usuario de negocio, usuario del sistema y la elección de casos para probar el funcionamiento del sistema. Luego, se identificaron el punto de origen y de destino de las fuentes a migrar en los casos de prueba. Se identificaron los lineamientos de datos que podrían impactar en la implementación. Con toda la información previa y algunas consideraciones se diseñó la arquitectura de datos. Finalmente, se implementó el sistema genérico garantizándose su funcionamiento con las pruebas de migraciones realizadas. Con los casos de migraciones de fuentes probadas, se logró una reducción en el tiempo de los desarrollos que acelerará las migraciones y también se logró la reducción en el tiempo de ejecución de los procesos de migración. Many organizations, especially in the financial sector that handle large volumes of data, are in the process of migrating their data to a data lake repository. Among the main drawbacks that have been found is the lack of technical knowledge to carry out the migrations. Likewise, the teams in charge that have the knowledge are not able to cope with a large number of sources to migrate, impacting the business teams that require the information. Lastly, for the sources already migrated with other technologies, there are cases of cancellations or excessive times in the execution of the processes. The objective of this work is to implement a generic system, having the spark data processing engine as its main component, to accelerate and optimize the migration of data from transactional systems to a data lake repository in a bank. The use of this system is simple, through the configuration of parameters and basic knowledge of sql, it is possible to migrate data, which even business users could use. To achieve the implementation, the requirements of the business user, system user and the choice of cases were obtained to test the operation of the system. Then, the point of origin and destination of the sources to be migrated in the test cases were identified. Data guidelines that could impact implementation were identified. With all the previous information and some considerations, the data architecture was designed. Finally, the generic system was implemented, guaranteeing its operation with the migration tests carried. With the cases of migrations of proven sources, a reduction in the time of the developments was achieved that will speed up the migrations and the reduction in the execution time of the migration processes was achieved.
URI:	http://hdl.handle.net/20.500.14076/26622
Rights:	info:eu-repo/semantics/restrictedAccess
Appears in Collections:	Ingeniería de Sistemas

Files in This Item:

File	Description	Size	Format
cruz_ra.pdf		9,29 MB	Adobe PDF	View/Open
cruz_ra(acta).pdf		505,02 kB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License

Indexado por: