System Design Specification

System-Design-Specification

Project team members

  • 杨钧涯
  • 王孟涵
  • 缪润杰
  • 苏泊鑫
  • 肖嘉皓

Document Change Log

Change Date Changed By Version Change Description
04/23/2024 Junya Yang 1.0 Prepared Document
       

Table of Contents

  1. DOCUMENT CHANGE LOG
  2. TABLE OF CONTENTS
  3. DESIGN OVERVIEW
  4. TOOLS AND STANDARDS
    1. Tools
    2. Standards
  5. USER INTERFACE DESIGN
    1. Usage Scenario I
    2. Usage Scenario II
  6. DIAGRAMS

DESIGN OVERVIEW

Design Overview for the Data Analysis Platform

Our platform is meticulously architected to provide an end-to-end data analysis solution, encompassing data preprocessing, modeling, and visualization functionalities. To ensure a robust and responsive user experience, our system architecture distinctly separates the frontend and backend services.

Backend Architecture: We leverage the power of Python’s FastAPI framework, renowned for its high performance and ease of use, to handle data-intensive backend operations efficiently. This choice enables us to implement asynchronous processing, significantly boosting the responsiveness of our platform.

Frontend Design: The user interface is built using Vue.js, a progressive JavaScript framework known for its adaptability and component-based architecture. This setup allows for a dynamic and seamless interaction with the data analysis functionalities, providing users with an intuitive experience as they upload and manipulate their data.

Core Functionalities:

  • Data Upload: Users can easily upload datasets via the frontend interface.
  • Data Preprocessing: The platform offers a suite of tools for cleaning and preparing data for analysis, ensuring data quality and readiness.
  • Modeling Capabilities: Users have access to a variety of built-in models, including regression and clustering, to uncover patterns and derive insights from their data.
  • Visualization Tools: Integrated visualization tools enable users to create engaging and informative visual representations of their analysis results.

Our platform is designed to be flexible, catering to a wide range of data analysis needs and making sophisticated data science accessible to users with varying levels of expertise.

Here, we show some models instruction

Linear regression is a basic statistical model, which is used to explore the linear relationship between variables. It assumes that there is a straight line relationship between the dependent variable and the independent variable, and predicts by fitting the best straight line. The fitting of the model is completed by the least square method, and the indexes such as the square of R, the square of adjusted R and the standard error are usually used in the evaluation. In application, we need to pay attention to model assumptions, such as linear relationship, multicollinearity and normal distribution of error terms.

Decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes. A decision tree starts with a root node, which does not have any incoming branches. The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes. Based on the available features, both node types conduct evaluations to form homogenous subsets, which are denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset.

K-nearest neighbor (KNN) is an instance-based supervised learning algorithm, and it is sensitive to data size and dimension. It performs classification or regression by measuring the distance between a new sample and a known sample in the training set. In classification, it assigns the new sample to the category to which the K closest neighbors belong. In regression, it predicts the value of the new sample, estimated by the average of the K closest neighbors. The working principle of the KNN algorithm consists of the following steps: Calculate distance, Select Nearest neighbor, Voting or calculating the mean.

[!info]- 设计总览 我们将会搭建一个通用的数据分析平台,平台将会实现包括数据预处理,建模,可视化等功能。 我们将使用前后端分离的架构,后端将使用由python搭建的FASTAPI框架 前端将使用Vue.js 用户将通过我们搭建的前端界面,上传数据;选择希望执行的操作,完成一整套数据分析的流程 们也提供常用的模型包括回归,聚类

Tools and Standards

Tools

In python we use the Numpy, pandas, sklearn, matplotlib libraries

Standards

Common to all platforms, including MacOS, Windows, Linux Accessible via browser on all platforms

User Interface Design

User Interface

Diagrams

BackLink