Skip to content

feat: validate file formats in url #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

makram93
Copy link
Contributor

@makram93 makram93 commented May 31, 2023

Solves: #1555

Solution:

  • rely on the mime-types library to detect the mime type from the URL and raise an error if it is not the expected type for that class
  • if auto-detection of mime-type fails then we check if the file ending is among the extra extensions
    • Check for exceptional cases, if there is still no match then raise the error

CAVEAT: Any file without an extension will raise an error, for example, a wav audio file with no extension will raise an error.

Signed-off-by: Mohammad Kalim Akram <kalim.akram@jina.ai>
@makram93 makram93 changed the title fix: validate file formats in url feat: validate file formats in url May 31, 2023
@JoanFM
Copy link
Member

JoanFM commented May 31, 2023

I think doing a request to the URL is an overkill and it will make it very slow

@JoanFM JoanFM linked an issue Jun 14, 2023 that may be closed by this pull request
makram93 and others added 3 commits June 23, 2023 03:11
Signed-off-by: Mohammad Kalim Akram <kalim.akram@jina.ai>
Signed-off-by: Mohammad Kalim Akram <kalim.akram@jina.ai>
@jupyterjazz jupyterjazz changed the base branch from main to feat-file-validation June 26, 2023 08:59
@jupyterjazz jupyterjazz requested a review from JoanFM June 26, 2023 09:02
@jupyterjazz jupyterjazz marked this pull request as ready for review June 26, 2023 09:07
@jupyterjazz jupyterjazz merged commit 49fd592 into docarray:feat-file-validation Jun 26, 2023
JoanFM pushed a commit that referenced this pull request Jun 27, 2023
Signed-off-by: Mohammad Kalim Akram <kalim.akram@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Mohammad Kalim Akram <kalim.akram@jina.ai>
@JoanFM JoanFM mentioned this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Url types are not aware of extension during validation
3 participants